CVS difference for ai12s/ai12-0242-1.txt

Differences between 1.8 and version 1.9
Log of other versions for file ai12s/ai12-0242-1.txt

--- ai12s/ai12-0242-1.txt	2018/03/30 07:55:08	1.8
+++ ai12s/ai12-0242-1.txt	2018/05/11 05:58:41	1.9
@@ -5284,8 +5284,8 @@
 Sent: Tuesday, January 16, 2018  9:44 AM
 
 > ...
->> As far as explicit parallelism, I now believe the right way to handle 
->> reductions in that context is to provide a chunk index to a parallel 
+>> As far as explicit parallelism, I now believe the right way to handle
+>> reductions in that context is to provide a chunk index to a parallel
 >> loop, vaguely analogous to an entry family index. ...
 [See AI12-0251-1 for this thread.]
 
@@ -5337,7 +5337,7 @@
 From: Randy Brukardt
 Sent: Tuesday, January 16, 2018  4:19 PM
 
-> We talked about this in Vienna (or was it Lexington), and decided to 
+> We talked about this in Vienna (or was it Lexington), and decided to
 > leave out an explicit "parallel" from the reduction expression.
 
 My recollection of this discussion (in Lexington) was a bit different. :-)
@@ -5353,13 +5353,13 @@
 because I hadn't even thought about the possibility. Now that I have, I
 strongly agree with Brad and Raphael's objections, and have more on top.
 
-> If the compiler has to be smart enough to check whether parallelism 
-> would be legal, it is smart enough to insert it. The same thing would 
-> apply to quantified expressions as well, and eventually container 
+> If the compiler has to be smart enough to check whether parallelism
+> would be legal, it is smart enough to insert it. The same thing would
+> apply to quantified expressions as well, and eventually container
 > aggregates, etc.  For these kinds of "self-contained"
-> expressions, I don't think we should create explicitly parallel 
-> versions, as it just adds complexity -- they should be seen as 
-> inherently parallel, with the compiler deciding when it is worth to 
+> expressions, I don't think we should create explicitly parallel
+> versions, as it just adds complexity -- they should be seen as
+> inherently parallel, with the compiler deciding when it is worth to
 > make them actually parallel.
 
 Ada always allows as-if optimizations, and surely if the compiler can prove
@@ -5404,8 +5404,8 @@
 From: Jean-Pierre Rosen
 Sent: Tuesday, January 16, 2018  11:41 PM
 
-> Combining these two things, parallelization can only be automatically 
-> applied when it is known that there are enough iterations and cost to 
+> Combining these two things, parallelization can only be automatically
+> applied when it is known that there are enough iterations and cost to
 > each iteration to make the savings be more than the overhead.
 I would add:
 
@@ -5417,8 +5417,8 @@
 From: Brad Moore
 Sent: Wednesday, January 17, 2018  12:07 AM
 
-> For myself, this was a complete surprise and I didn't object at least 
-> in part because I hadn't even thought about the possibility. Now that 
+> For myself, this was a complete surprise and I didn't object at least
+> in part because I hadn't even thought about the possibility. Now that
 > I have, I strongly agree with Brad and Raphael's objections, and have more
 > on top.
 
@@ -5438,7 +5438,7 @@
 Since then, we came up with the 'Reduce attribute idea. To have another
 attribute such as 'Parallel_Reduce fits quite a lot more nicely into this
 scheme. There is no additional keyword needed. It's just a substitution of a
-different attribute name. 
+different attribute name.
 
 I note also that the proposal for explicit parallelism that you proposed the
 other day relies on using 'Reduce calls, which is implicit parallelism.
@@ -5446,7 +5446,7 @@
 parallelism, shouldn't there also be a way to explicitly indicate the the
 'Reduce calls associated with that syntax are also to be executed with
 parallelism?In a lot of cases, parallelism isn't wanted, but other times it
-would be. 
+would be.
 
 ****************************************************************
 
@@ -5574,9 +5574,9 @@
 From: Randy Brukardt
 Sent: Wednesday, February  7, 2018  5:55 PM
 
-> Since my proposal seems to not have been very clear, here is my 
+> Since my proposal seems to not have been very clear, here is my
 > suggestion again from scratch.
-> 
+>
 > Currently, the proposal uses some kind of expression (aggregate?
 > container? array?) as the prefix:
 >    <expression>'Reduce (F, <value>)
@@ -5627,7 +5627,7 @@
 make it clunky to use in some contexts (needing to define a type just for it),
 but the idea of an expression with no defined type is wildly out of character
 for Ada and is a nonstarter with me. (Your proposal does nothing for this.)
- 
+
 > - All other attributes denote values or subprograms. What does 'Reduce
 >   denote?  And what is F? A name? Then 'Reduce can't be a function! It
 >   is actually a whole new kind of expression.
@@ -5648,14 +5648,14 @@
 
 A legitimate gripe, but one that applies equally to prefixed views and
 (recently) existing attributes like Obj'Image. Argubly, that bird has flown.
-Moreover, I expect that many of these expressions will be directly in array 
+Moreover, I expect that many of these expressions will be directly in array
 objects and the prefixes will be no longer than other typical attributes (more
 below).
 
-> My proposal is to change the syntax (NOT the semantics) by making F 
-> the prefix: F'Reduce (<expression>) (The issue of having or not a 
+> My proposal is to change the syntax (NOT the semantics) by making F
+> the prefix: F'Reduce (<expression>) (The issue of having or not a
 > default value is independent and deserves another discussion).
-> 
+>
 > Here:
 > - The prefix is a name - no change in syntax
 
@@ -5677,7 +5677,7 @@
 >   a regular function
 
 Sorry, but Ada does not have a type which is "syntactically an array whose
-components are appropriate for F". One cannot write such a function. This 
+components are appropriate for F". One cannot write such a function. This
 would be a function in name only, which is as useful as describing 'Pos as a
 function (not at all, leaning toward harmful).
 
@@ -5702,21 +5702,21 @@
 in Ada terms. So one would never want to treat it as a real function (in terms
 of renaming and the like).
 
-> This does NOT imply really creating an array to be passed to a regular 
-> function.  We just need to specify that the components are evaluated 
-> in no particular order.  Not creating the array is simply like putting 
+> This does NOT imply really creating an array to be passed to a regular
+> function.  We just need to specify that the components are evaluated
+> in no particular order.  Not creating the array is simply like putting
 > F'Reduce inline.
 > It's 100% "as if".
-> 
-> OTOH, a compiler would be /allowed/ to construct an array if it is 
-> more convenient or efficient in some cases, and/or to generate an 
+>
+> OTOH, a compiler would be /allowed/ to construct an array if it is
+> more convenient or efficient in some cases, and/or to generate an
 > actual function.
 
 This is surely the intent of Brad's proposal, and every proposal (even the
 crazy ones) I've seen.
 
-> We could rule that F'Reduce has convention => intrinsic if we don't 
-> want it to be passed to generics - or not. After all, all inlined 
+> We could rule that F'Reduce has convention => intrinsic if we don't
+> want it to be passed to generics - or not. After all, all inlined
 > subprograms need a regular version for generics.
 
 All of the attribute functions are already Intrinsic, but that doesn't prevent
@@ -5727,10 +5727,10 @@
 barely tested, as it never happens in practice and rarely in the ACATS.)
 
 Luckily, this doesn't matter for your proposal, as with Pos and Val and many
-others, you cannot describe the profile in Ada, so renaming/generic actuals 
+others, you cannot describe the profile in Ada, so renaming/generic actuals
 aren't possible anyway.
 
-> I think this model would be easier to understand and explain, and 
+> I think this model would be easier to understand and explain, and
 > would fit better in the current syntax.
 
 I'm not going to argue with an opinion, but there is no difference in the
@@ -5777,11 +5777,11 @@
 
 From an admittedly personal perspective, it feels like the ongoing discussion
 of the "reduction expression" seems to have entered into the realm of "design
-by committee," which I think we have usually avoided in the past. The notion 
+by committee," which I think we have usually avoided in the past. The notion
 of a "reduction expression" was suggested a year or so ago, in part because I
 have found it to be extraordinarily useful in ParaSail, and because it
 included the notion of "reduction" which keeps coming up in discussions of
-parallelism.  
+parallelism.
 
 I have also seen the use of containers grow in Ada 2012 relative to Ada 2005
 largely because of the iterator "syntactic sugar" added in Ada 2012.  The
@@ -5803,13 +5803,13 @@
 
 For Ada 2020, building on the iterator "syntactic sugar" to me seems a key to
 growing the real power of programming in Ada.  Things like iterator "filters"
-are part of that -- they are not just "gewgaws."   Various combiners such as 
+are part of that -- they are not just "gewgaws."   Various combiners such as
 concurrent iterators also add to the power.
 
 Below is a copy of the set of examples of use of map/reduce expressions from
 an earlier e-mail, with ParaSail's "|" replaced with Ada's "&" where
 appropriate and "when" used for filters.  One thing to notice is that many of
-the "reductions" involve concatenations of strings, which clearly would be a 
+the "reductions" involve concatenations of strings, which clearly would be a
 real pain if you first had to create an array of strings, something which is
 very awkward in Ada.
 
@@ -5835,7 +5835,7 @@
 -- The "when bool" is a filter that controls whether any action is taken
 -- for a particular item in the container
 
-const Num_Outputs := 
+const Num_Outputs :=
                      (for each P of Formal_Params
                        when P.Is_Operation_Output => <0> + 1)
 
@@ -5875,7 +5875,7 @@
 --  the filter (i.e. are either outputs or variables).
 --  Rather than using the most positive and most negative values as the
 --  "identity" for Min and Max, in Parasail, null is used, meaning both
--- x(null, X) = X 
+--  Min(null, X) = X and Max(null, X) = X
 const First_Updated :=
  (for each [I => P] of Params
     when P.Is_Operation_Output or else P.Is_Var => Min(<null>, I))
@@ -5904,9 +5904,9 @@
    <" Live_Out_Set ="> & " [VN" & Loc & "=>VN" & VN & "]"))
 
 
--- es that are found by following the 
--- 
--- X := Y then Z while W" iterator which 
+--  Count the number of nodes that are found by following the
+--  Parent chain.
+--  This uses the general "X := Y then Z while W" iterator which
 --  initializes X to Y, and then sets it to Z on each subsequent
 --  iteration, as long as "W" remains true.
 return (for N := Node_Id then Node_Tree.Parent[N]
@@ -5917,7 +5917,7 @@
 assert ((for each Size of Bit_Field_Sizes => <0> + abs(Size)) <= 63);
 
 
--- ng from Bit-Field Key to the 
+--  Build up an array mapping from Bit-Field Key to the
 --  sum of the sizes of all bit-fields preceding this bitfield
 const Bit_Field_Offsets : Array<Univ_Integer, Indexed_By => Key_Type> :=
  [for each [Key => Size] of Bit_Field_Sizes, Key =>
@@ -5946,7 +5946,7 @@
 true).
 
 > The reduction
-> expression was new "syntactic sugar" that combined the power of 
+> expression was new "syntactic sugar" that combined the power of
 > iterators with the notion of Map/Reduce.
 
 I recall the lead designer of Ada 95 telling people far and wide that Ada was
@@ -5955,7 +5955,7 @@
 becomes clear when one just needs to reduce an array (likely to be a very
 common operation associated with normal parallel loops in Ada 2020). With your
 old proposed syntax, you have to write an iterator to do that, even though the
-iterator is irrelevant noise. 
+iterator is irrelevant noise.
 
 Ada has a great support for creating maps (in the form of arrays), and Ada
 2020 promises to extend that even further. So what's missing is a Reduce
@@ -5965,24 +5965,24 @@
 having an actual reduce primitive just throws away all of the Ada history of
 building blocks.
 
-> The notion of "map/reduce" refers to performing an operation (the 
-> "map" part) on some or all elements of some large data set or large 
-> search space, and then combining the individual results (the "reduce" 
+> The notion of "map/reduce" refers to performing an operation (the
+> "map" part) on some or all elements of some large data set or large
+> search space, and then combining the individual results (the "reduce"
 > part) to produce a single "reduced"
-> result.  After many back and forths on e-mail, we seem to have ended 
-> up with little more than something which applies an operator over the 
-> elements of an array, and makes no use of the power of "syntactic 
-> sugar" to raise the level of abstraction.  To me this is quite a 
+> result.  After many back and forths on e-mail, we seem to have ended
+> up with little more than something which applies an operator over the
+> elements of an array, and makes no use of the power of "syntactic
+> sugar" to raise the level of abstraction.  To me this is quite a
 > disappointment.
 
 Raising the "level of abstraction" beyond the level that most programmers can
 understand is not helpful. I'd rather the handful of super-advanced programmers
 be disappointed. And there is nothing natural or understandable about the
-combination of Map/Reduce. It only began to make sense when we started talking 
+combination of Map/Reduce. It only began to make sense when we started talking
 about Reduce on its own. (I think it exists solely because of its potential to
 describe parallelism -- Ada is about describing the problem in programming
 language terms, not making the problem unrecognizable to fit some preconceived
-notion of abstraction.) 
+notion of abstraction.)
 
 Adding a bunch of "syntactic broccoli" to describe a construct that few will
 understand or use is not an improvement.
@@ -5996,10 +5996,10 @@
 interleave the evaluation of the prefix (and its parts) with the actual Reduce
 operation (I think it is missing in Brad's proposal).
 
-> I believe iterators are probably the most important new "programming" 
-> feature of Ada 2012, while pre/postconditions are the most important 
-> new "declarative" feature.  And for that matter, iterator-based 
-> constructs like quantified expressions are critical to writing concise 
+> I believe iterators are probably the most important new "programming"
+> feature of Ada 2012, while pre/postconditions are the most important
+> new "declarative" feature.  And for that matter, iterator-based
+> constructs like quantified expressions are critical to writing concise
 > pre/postconditions.
 
 I definitely do not agree with this. For loop iterators for containers are a
@@ -6009,11 +6009,11 @@
 if necessary. (We do need to be able to declare such functions "extra pure" ==
 global null, so that they can be optimized algebraically - Ada 2020 ought to
 fix that.)
- 
-> For Ada 2020, building on the iterator "syntactic sugar" to me seems a 
-> key to growing the real power of programming in Ada.  Things like 
+
+> For Ada 2020, building on the iterator "syntactic sugar" to me seems a
+> key to growing the real power of programming in Ada.  Things like
 > iterator "filters" are part of that -- they
-> are not just "gewgaws."   Various combiners such as 
+> are not just "gewgaws."   Various combiners such as
 > concurrent iterators also add to the power.
 
 We surely need concurrent iterators, but more than that is just too much.
@@ -6021,16 +6021,16 @@
 -- that's done by providing good abstract operations for an ADT. It's exposing
 more implementation than necessary.
 
-... 
-> So I would recommend we return to a powerful syntax that is based on 
+...
+> So I would recommend we return to a powerful syntax that is based on
 > iterators, with operators in their natural "infix"
-> position, etc.  Otherwise, I believe we are wasting the opportunity 
+> position, etc.  Otherwise, I believe we are wasting the opportunity
 > and not really adding anything significant.
 
 I'm afraid that reraising this at this point is likely to kill Ada 2020, as
 there is little likelihood of consensus on this point.
 
-> At a minimum, we should look at how examples like those below would be 
+> At a minimum, we should look at how examples like those below would be
 > supported in whatever proposal we come up with.
 
 I agree with this. I did the first few examples here. The annoyance (IMHO) is
@@ -6058,7 +6058,7 @@
 
 
 ...
-> const Num_Outputs := 
+> const Num_Outputs :=
 >                      (for each P of Formal_Params
 >                        when P.Is_Operation_Output => <0> + 1)
 
@@ -6078,7 +6078,7 @@
 -- This is essentially the same as the last one:
 
     Num_Outputs_or_Vars : constant Natural :=
-        Mapper'(for each P of Formal_Params => 
+        Mapper'(for each P of Formal_Params =>
            Boolean'Pos(P.Is_Operation_Output or else P.Is_Var)'Reduce("+",0);
 
 > -- Count the number of non-inputs
@@ -6087,17 +6087,17 @@
 
 -- This is identical to the first, not going to waste my time copying it.
 
-> --  This asserts that the sum of the sizes in Bit_Field_Sizes should 
-> be <= 63 assert ((for each Size of Bit_Field_Sizes => <0> + abs(Size)) 
+> --  This asserts that the sum of the sizes in Bit_Field_Sizes should
+> be <= 63 assert ((for each Size of Bit_Field_Sizes => <0> + abs(Size))
 > <= 63);
 
 pragma assert (Mapper'(for each Size of Bit_Field_Sizes =>
                         abs(Size))'Reduce("+",0) <= 63);
 
-> --  Hash the characters of a string to produce an integer func 
+> --  Hash the characters of a string to produce an integer func
 > Hash_Vec(Vec : Vector<Univ_Character>) return Univ_Integer is
 >   return (for I in 1 .. Vec'Length reverse =>
->     (<0> * 127 + (Vec[I] - Char_First)) mod Hash_Modulus) end func 
+>     (<0> * 127 + (Vec[I] - Char_First)) mod Hash_Modulus) end func
 > Hash_Vec
 
 -- I think this one needs a helper function (I used Ada types here):
@@ -6149,16 +6149,16 @@
 From: Brad Moore
 Sent: Thursday, February  8, 2018  10:45 PM
 
-> Before we look at Tuck's example's (many of which are weird uses for 
+> Before we look at Tuck's example's (many of which are weird uses for
 > this construct), let's look at a pair of common cases written both ways:
-> 
+>
 > -- Reducing the result of a manually-chunked parallel loop:
-> 
+>
 > -- Parasail-like:
 >    Sum := (for E of Partial_Sum => <0> + E);
 > -- AI12-0242-1:
 >    Sum := Partial_Sum'Reduce("+", 0);
-> 
+>
 > -- Calculating the sum of squares for a data array (part of statistics gathering):
 > -- Parasail-like:
 >    Sum_Sqr := (for I in Data'range => <0.0> + Data(I)*Data(I));
@@ -6184,11 +6184,11 @@
 >> const Num_Outputs :=
 >>                      (for each P of Formal_Params
 >>                        when P.Is_Operation_Output => <0> + 1)
-> 
+>
 >    type Mapper is array (Positive range <>) of Natural;
-> 
+>
 >    Num_Outputs : constant Natural :=
->        Mapper'(for each P of Formal_Params => 
+>        Mapper'(for each P of Formal_Params =>
 > Boolean'Pos(P.Is_Operation_Output)'Reduce("+",0);
 
 I think a paren is missing, and the "each" keyword which doesn't exist in
@@ -6225,19 +6225,19 @@
 
 For the remaining examples, I will assume that we don't have the "When" clause.
 
-> -- I'm assuming that we add all iterators to array aggregates as 
+> -- I'm assuming that we add all iterators to array aggregates as
 > previously proposed (although no one has written that up).
-> 
->> --  This counts the number of parameters that are outputs or 
+>
+>> --  This counts the number of parameters that are outputs or
 >> variables
 >> --  with the filter in "{...}" again
 >> (for each P of Params
 >>    when P.Is_Operation_Output or else P.Is_Var => <0> + 1)
-> 
+>
 > -- This is essentially the same as the last one:
-> 
+>
 >    Num_Outputs_or_Vars : constant Natural :=
->        Mapper'(for each P of Formal_Params => 
+>        Mapper'(for each P of Formal_Params =>
 > Boolean'Pos(P.Is_Operation_Output or else P.Is_Var)'Reduce("+",0);
 
 Same typos apply here (missing paren and extra each) ....
@@ -6246,7 +6246,7 @@
          Mapper'(for P of Formal_Params =>
             Boolean'Pos(P.Is_Operation_Output or else P.Is_Var))'Reduce("+",0);
 
-or 
+or
    function Output_Or_Vars return Mapper is
       (for P of Formal_Params => Boolean'Pos(P.Is_Operation_Output or else P.Is_Var => 0));
 
@@ -6254,17 +6254,17 @@
 
 
 ...
-> -- If 'Reduce allowed the first parameter of the combiner to be of 
+> -- If 'Reduce allowed the first parameter of the combiner to be of
 > another type,
-> -- and you didn't insist on hashing in reverse (which seems 
+> -- and you didn't insist on hashing in reverse (which seems
 > unnecessary to me),
 > -- this could be a lot simpler:
-> 
->   function Hash_One (New_Char : Character; Pending : Natural) return 
+>
+>   function Hash_One (New_Char : Character; Pending : Natural) return
 > Natural is
->      ((Pending * 127 + (Character'Pos(New_Char) - Char_First)) mod 
+>      ((Pending * 127 + (Character'Pos(New_Char) - Char_First)) mod
 > Hash_Modulus);
-> 
+>
 >   function Hash_Vec(Vec : String) return Natural is
 >     (Vec'Reduce(Hash_One, 0));
 
@@ -6276,31 +6276,31 @@
 >> --  stopping when either reaches its end.
 >> --  So in the case, we are iterating over the elements of "DF" while
 >> --  also doing a simple "X := Y then Z"
->> --  iterator which sets X to Y initially and then sets it to Z on 
+>> --  iterator which sets X to Y initially and then sets it to Z on
 >> each
 >> --  subsequent iteration.
 >> const DF_Image :=
 >>  (for (F in DF; Sep := "" then ", ") => <"["> & Sep & F) & "]"
-> 
-> -- I cannot for the life of me figure how this is supposed to work. 
+>
+> -- I cannot for the life of me figure how this is supposed to work.
 > Which
 > -- demonstrates my original point, I think...
-> 
+>
 > -- In any case, for a language that supposedly is mostly used for uses
 > -- where no heap use is allowed, this kind of example (which will use
-> -- loads of heap doing all of these concats) seems silly. I'd rather 
+> -- loads of heap doing all of these concats) seems silly. I'd rather
 > use
 > -- 'Image function overloading for this sort of thing anyway.
-> 
-> I give up here for now. I think we may need to allow the first 
-> parameter of the combiner to be of a different type in order to avoid 
-> having clunky conversions (and unnecessary array types) in the uses of 
+>
+> I give up here for now. I think we may need to allow the first
+> parameter of the combiner to be of a different type in order to avoid
+> having clunky conversions (and unnecessary array types) in the uses of
 > Reduce, but I'd like to see more examples that are actually reducing something...
-> 
-> One thought: "'Reduce("+",0)" is so common that one could easily 
-> imagine having a "'Sum" operation with just that meaning; it would 
+>
+> One thought: "'Reduce("+",0)" is so common that one could easily
+> imagine having a "'Sum" operation with just that meaning; it would
 > simplify the writing of a lot of these operations.
-> 
+>
 
 Carrying on from where Randy left off....
 
@@ -6321,7 +6321,7 @@
 those examples which seem to involve reduction of string arrays into a string
 result maybe are not as common a use case as it might seem, looking at the
 broader usage. We might find in 90 % or more the reductions involve parameters
-and results of the same type. 
+and results of the same type.
 
 -- Parasail
 return (for R in Rngs forward =>
@@ -6357,7 +6357,7 @@
 --  the filter (i.e. are either outputs or variables).
 --  Rather than using the most positive and most negative values as the
 --  "identity" for Min and Max, in Parasail, null is used, meaning both
--- x(null, X) = X 
+--  Min(null, X) = X and Max(null, X) = X
 
 -- Parasail
 const First_Updated :=
@@ -6400,12 +6400,12 @@
 From: Tucker Taft
 Sent: Friday, February  9, 2018  10:00 AM
 
-> I recall the lead designer of Ada 95 telling people far and wide that 
-> Ada was about providing building blocks, not finished structures. This 
-> was a classic example of a "finished" feature rather than a set of 
-> building blocks. That becomes clear when one just needs to reduce an 
-> array (likely to be a very common operation associated with normal 
-> parallel loops in Ada 2020). With your old proposed syntax, you have 
+> I recall the lead designer of Ada 95 telling people far and wide that
+> Ada was about providing building blocks, not finished structures. This
+> was a classic example of a "finished" feature rather than a set of
+> building blocks. That becomes clear when one just needs to reduce an
+> array (likely to be a very common operation associated with normal
+> parallel loops in Ada 2020). With your old proposed syntax, you have
 > to write an iterator to do that, even though the iterator is irrelevant noise.  ...
 
 It seems our respective notion of "building blocks" don't agree.  I believe
@@ -6414,7 +6414,7 @@
 by an iterator, optionally performing a computation on them (using the
 existing Ada computational  "building blocks"), and then combining the
 elements into a single value (this is the new building block using
-"<initial val>" or some other syntax).  
+"<initial val>" or some other syntax).
 
 Building blocks always produce somewhat more verbose representations for any
 given special case (e.g. reducing an array), but ultimately provide much more
@@ -6429,7 +6429,7 @@
 object with a single binary function.  By using more flexible syntax, we can
 support reduction of a sequence of values produced in many different ways,
 without ever having to produce in memory a single concrete object representing
-the sequence.    
+the sequence.
 
 The original discussion of 'Reduce had it combined with a container aggregate,
 and at that point I presumed we were talking about a very special
@@ -6913,7 +6913,7 @@
 to construct an array or a container object, or can be an argument to a
 reduction operation. As such they are a natural prefix for an attribute, if
 this is the choice syntax. That prefix may look like an aggregate but of course
-there is no need to build one, and therefore no type needs to be specified and 
+there is no need to build one, and therefore no type needs to be specified and
 the resolution rules are the same as for any other iterator construct.
 
 b) The specification of the default of the reduction by means of a bracketed
@@ -6953,9 +6953,9 @@
 long as the overload resolution rules for aggregate'Reduce are appropriate for
 the purpose, and don't require the definition of a named type for the
 pseudo-aggregate just so you can use qualified-expression syntax.  It also
-presumes we define container aggregates.  
+presumes we define container aggregates.
 
-Both syntactic approaches rely on having iterators with filters.  I have no 
+Both syntactic approaches rely on having iterators with filters.  I have no
 particular problem with allowing "when" clauses on more kinds of statements,
 but that seems to me (but not to everyone, I guess) rather orthogonal to their
 role as filters.  Their role as filters is closer to their role as entry-body
@@ -6969,7 +6969,7 @@
 whereas there is exactly one prefix to an attribute, and we already know that
 some attributes (such as 'Access) have special overloading rules for their
 prefix. I also happen to think it is more natural to have the iterator appear
-first, though that is clearly a personal preference.  That is, I like to see 
+first, though that is clearly a personal preference.  That is, I like to see
 the generator of the values first, and then the operation to be performed on
 the generated values.  Finally, putting the iterator inside the parameter list
 would result in two levels of parentheses in most cases, which adds to the
@@ -6980,7 +6980,7 @@
 From: Randy Brukardt
 Sent: Friday, February  9, 2018  4:44 PM
 
-> I hesitate to wade into this heated discussion, but I'd like to try to 
+> I hesitate to wade into this heated discussion, but I'd like to try to
 > summarize my view on the possible approaches to Reduce operations:
 
 Thanks for doing so. Someone has to be a voice of moderation here. :-)
@@ -6993,10 +6993,10 @@
 significant.
 
 ...
-> e) Container aggregates fall naturally out of these iterators. We do 
-> need to generalize named associations to handle map containers of 
-> course, and Tuck has already proposed such. Cross-product iterators, 
-> i.e. nested iterators to describe multi-dimensional constructions, are 
+> e) Container aggregates fall naturally out of these iterators. We do
+> need to generalize named associations to handle map containers of
+> course, and Tuck has already proposed such. Cross-product iterators,
+> i.e. nested iterators to describe multi-dimensional constructions, are
 > useful but I'm not sure the syntax is clear enough.
 
 This I (mildly) disagree with. Perhaps this is mainly terminology. I see
@@ -7038,7 +7038,7 @@
 complaining that we hadn't written any such examples).
 
 He doesn't seem to realize that I've spent a lot of time thinking about the
-bare aggregate case; the problem is that I can't make it work, because the 
+bare aggregate case; the problem is that I can't make it work, because the
 type of the prefix is needed -- it's what determines the type/profiles of the
 various other parts of the attribute. I don't want to invent new kinds of
 type-less things (that's not Ada, and besides resolution is complex enough).
@@ -7051,8 +7051,8 @@
 From: Randy Brukardt
 Sent: Friday, February  9, 2018  4:53 PM
 
->This claim of little value for quantified expressions may be based on 
->your own coding style, or anticipation thereof, but we have a number of 
+>This claim of little value for quantified expressions may be based on
+>your own coding style, or anticipation thereof, but we have a number of
 >folks actually using Ada 2012, and SPARK 2014 which is based on it, and
 >quantified expressions are used heavily.
 
@@ -7074,13 +7074,13 @@
 Sent: Friday, February  9, 2018  5:57 PM
 
 ...
-> Building blocks always produce somewhat more verbose representations 
-> for any given special case (e.g. reducing an array), but ultimately 
-> provide much more power to the programmer when the programmer wants to 
-> go beyond a specially supported special case.  I can see that the 
-> 'Reduce attribute works nicely for the array special case, but fails 
-> when the elements you want to reduce represent some other sort of 
-> sequence of items, such as that which can be produced by a combination 
+> Building blocks always produce somewhat more verbose representations
+> for any given special case (e.g. reducing an array), but ultimately
+> provide much more power to the programmer when the programmer wants to
+> go beyond a specially supported special case.  I can see that the
+> 'Reduce attribute works nicely for the array special case, but fails
+> when the elements you want to reduce represent some other sort of
+> sequence of items, such as that which can be produced by a combination
 > of an iterator, an optional filter, and an Ada expression.
 
 We're going to have to agree to disagree on what is and is not a building
@@ -7105,13 +7105,13 @@
 to declare an appropriate array type annoying, but I'm unable to find any sane
 semantics that avoids it.
 
-> I think it is a big loss to only support reduction over an existing 
-> concrete object with a single binary function.  By using more flexible 
-> syntax, we can support reduction of a sequence of values produced in 
-> many different ways, without ever having to produce in memory a single 
-> concrete object representing the sequence.    
+> I think it is a big loss to only support reduction over an existing
+> concrete object with a single binary function.  By using more flexible
+> syntax, we can support reduction of a sequence of values produced in
+> many different ways, without ever having to produce in memory a single
+> concrete object representing the sequence.
 
-Again, you're totally ignoring what is actually proposed. The actual proposal 
+Again, you're totally ignoring what is actually proposed. The actual proposal
 includes (or is intended to, I think Brad may have left it out) a
 notwithstanding allowing interleaving of the evaluation of the prefix and
 'Reduce, including if the prefix is an aggregate, interleaving of the
@@ -7126,19 +7126,19 @@
 issue and it is intensely frustrating to have to you raise this bogeyman
 repeatedly.
 
-> The original discussion of 'Reduce had it combined with a container 
-> aggregate, and at that point I presumed we were talking about a very 
-> special interpretation of the container aggregate, where it didn't 
-> have a type, it just represented a sequence of values that are to be 
-> reduced.  But that presumes we agree on the syntax of a container 
-> aggregate and the special significance of 'Reduce applied to such an 
+> The original discussion of 'Reduce had it combined with a container
+> aggregate, and at that point I presumed we were talking about a very
+> special interpretation of the container aggregate, where it didn't
+> have a type, it just represented a sequence of values that are to be
+> reduced.  But that presumes we agree on the syntax of a container
+> aggregate and the special significance of 'Reduce applied to such an
 > aggregate.
->  But if we start insisting that this is a "normal" attribute with 
-> normal rules such as being able to resolve the prefix to a single, 
-> pre-existing type, then the idea falls apart in my view.  So if we 
-> have heartburn with giving the aggregate'Reduce attribute special 
-> overloading rules, etc., then we should go back to some sort of new 
-> syntax rather than lose all of the power of using an iterator, filter, 
+>  But if we start insisting that this is a "normal" attribute with
+> normal rules such as being able to resolve the prefix to a single,
+> pre-existing type, then the idea falls apart in my view.  So if we
+> have heartburn with giving the aggregate'Reduce attribute special
+> overloading rules, etc., then we should go back to some sort of new
+> syntax rather than lose all of the power of using an iterator, filter,
 > etc., which the container aggregate would provide.
 
 The insight that I had is that this operation really doesn't have anything to
@@ -7204,13 +7204,13 @@
 From: Eddmond Schonberg
 Sent: Sunday, February 11, 2018  2:27 PM
 
-> He [Tuck] doesn't seem to realize that I've spent a lot of time 
-> thinking about the bare aggregate case; the problem is that I can't 
-> make it work, because the type of the prefix is needed -- it's what 
-> determines the type/profiles of the various other parts of the 
-> attribute. I don't want to invent new kinds of type-less things 
-> (that's not Ada, and besides resolution is complex enough). So I was 
-> hoping someone more invested (like, say, Mr. Taft) would make an 
+> He [Tuck] doesn't seem to realize that I've spent a lot of time
+> thinking about the bare aggregate case; the problem is that I can't
+> make it work, because the type of the prefix is needed -- it's what
+> determines the type/profiles of the various other parts of the
+> attribute. I don't want to invent new kinds of type-less things
+> (that's not Ada, and besides resolution is complex enough). So I was
+> hoping someone more invested (like, say, Mr. Taft) would make an
 > actual proposal for such an extension, rather than just going back to square
 > one and complaining.
 
@@ -7267,7 +7267,7 @@
 a Trivial Problem - especially around 40 min when he's talking about algebraic
 properties and then goes into parallel properties, eventually drawing the
 parallel between garbage-collection and parallelization (the former being
-assigning data to memory, the latter assigning work [code] to processors). 
+assigning data to memory, the latter assigning work [code] to processors).
 
 It also occurs to me that the application/promotion of ranges into a
 first-class Ada citizen might help; perhaps it might not be required, but as
@@ -7338,8 +7338,8 @@
 
 Tucker and I have been having a lengthy private back-and-forth on whether it
 is necessary to explicitly have a separate "parallel_reduce" attribute.
-(We've also hashed out some other issues, mostly covered elsewhere.) I think 
-we understand each others position (not that either of us have convinced the 
+(We've also hashed out some other issues, mostly covered elsewhere.) I think
+we understand each others position (not that either of us have convinced the
 other of much). Tucker clearly is much more optimistic about what parallelism
 can be automatically inserted for "normal" Ada code.
 
@@ -7351,7 +7351,7 @@
 faster. It is very difficult to prove that for parallelism, especially if the
 overhead is relatively high (as it is on most normal OSes).
 
-(2) (1) means that compiler writers have little incentive to provide any 
+(2) (1) means that compiler writers have little incentive to provide any
 parallelism for a construct like this. Including a keyword/separate attribute
 prods them to support it.
 
@@ -7458,7 +7458,7 @@
 tasking/parallelism will be easily determinable.)
 
 Parallelism is relatively natural in many problems. Some examples from my
-experience: the web server and e-mail filter both use pools of worker tasks 
+experience: the web server and e-mail filter both use pools of worker tasks
 that are assigned jobs as they come in (from the public internet). Each job is
 independent of any other, just depending on some global settings. One could
 have used a parallel loop in Ada 2020 to get a similar effect (although that
@@ -7482,11 +7482,11 @@
 Finally, point (6). If a problem is naturally parallel, one will want to write
 parallel code for that high-level parallelism -- and only that code.
 If that code turns out to be best expressed as a Reduce attribute, they
-certainly will not be happy to have to turn it into a parallel loop instead 
+certainly will not be happy to have to turn it into a parallel loop instead
 in order to require the generation of parallel behavior.
 
-Tucker does point out that the introduction of "parallel" anywhere really 
-doesn't require any parallel execution; a compiler could run everything on a 
+Tucker does point out that the introduction of "parallel" anywhere really
+doesn't require any parallel execution; a compiler could run everything on a
 single processor (and might be forced to do that by OS-level allocations).
 But it does require the safety checks that everything is allowed to be run in
 parallel (4); the user will not be writing code that can only be executed
@@ -7495,7 +7495,7 @@
 ---
 
 While I find none of these points compelling by themselves, the combination
-makes it pretty clear to me that we do want a separate parallel version of 
+makes it pretty clear to me that we do want a separate parallel version of
 the Reduce attribute. But we don't want the regular Reduce to have any rules
 that would prevent automatic parallelization in the future (presuming that it
 could have been a Parallel_Reduce).
@@ -7519,8 +7519,8 @@
 From: Bob Duff
 Sent: Saturday, February 17, 2018  5:17 PM
 
-> A subjective opinion, but I find it hard to believe that a compiler 
-> vendor would provide a difficult optimisation if it were optional, I 
+> A subjective opinion, but I find it hard to believe that a compiler
+> vendor would provide a difficult optimisation if it were optional, I
 > think we need both ‘Reduce and ‘Parallel_Reduce.
 
 But optimizations are always optional.  What's to stop an implementer from
@@ -7552,7 +7552,7 @@
 variable in a register is an example, in that if the variable is too long
 lived, it can cause too much register pressure, ultimately making surrounding
 code less efficient.  I have given up on this battle, but I predict that in
-the long run this distinction between 'Reduce and 'Parallel_Reduce will be 
+the long run this distinction between 'Reduce and 'Parallel_Reduce will be
 seen in the same light as C's old "register" annotation on local variables.
 It represents a claim that there are no bad usages (e.g. 'Address on a
 "register," data dependencies for 'Parallel_Reduce), but other than that, the
@@ -7587,13 +7587,13 @@
 loops where some programmer control can be useful.  For this particular case,
 the more appropriate place to put guidance would be on the "combiner"
 operation rather than on each use of it, vaguely analogous to putting a pragma
-Inline on a subprogram rather than on each call.  
+Inline on a subprogram rather than on each call.
 
 Many combiners are going to be builtins like Integer "+" about which the
 compiler presumably knows what it needs to know.  For user-defined combiners
 (e.g. some kind of set "union" operation), some measure of typical execution
 cost might be helpful, as might some measure of the expense of the Next
-operation of an iterator type.  
+operation of an iterator type.
 
 Perhaps some kind of subprogram "Cost" aspect could be defined, ranging from,
 say, 1 = cheap to 10 = very expensive (probably not a linear scale!  Perhaps
@@ -7601,9 +7601,9 @@
 be used to indicate at what cost parallelization becomes interesting, with
 lower numbers meaning more aggressive parallelization, and bigger numbers
 meaning minimal parallelization.  This global number could be interpreted as
-the relative cost of spawning a tasklet.  The Cost numbers wouldn't need to 
+the relative cost of spawning a tasklet.  The Cost numbers wouldn't need to
 have any absolute meaning -- they are merely a way to rank subprograms, and a
-way to indicate at what level the expense of executing the subprogram dwarfs 
+way to indicate at what level the expense of executing the subprogram dwarfs
 the overhead of spawning a tasklet.  Each increment in Cost might represent a
 factor of, say, 10 in relative cost, so 1 = 1 instruction, 2 = 10 instructions,
 10 = 1Giga instructions.  If spawning a tasklet takes, say 100 instructions
@@ -7617,7 +7617,7 @@
 
 The main point of all of this is that in my view we don't want programmers
 deciding at the individual reduction expression whether to parallelize or not.
-Based on my experience, there will be a lot of these scattered about, and you 
+Based on my experience, there will be a lot of these scattered about, and you
 really don't want to get bogged down in deciding about 'Reduce vs.
 'Parallel_Reduce each time you use one of these, nor going back and editing
 individual expressions to do your tuning.
@@ -7627,12 +7627,12 @@
 From: Brad Moore
 Sent: Saturday, February 17, 2018  1:50 PM
 
-> I was drawing the distinction between individual expressions, where 
-> the compiler does not generally expect direction for how to do 
-> register allocation, instruction scheduling, etc. vs. multi-statement 
-> constructs like loops where some programmer control can be useful.  
-> For this particular case, the more appropriate place to put guidance 
-> would be on the "combiner" operation rather than on each use of it, 
+> I was drawing the distinction between individual expressions, where
+> the compiler does not generally expect direction for how to do
+> register allocation, instruction scheduling, etc. vs. multi-statement
+> constructs like loops where some programmer control can be useful.
+> For this particular case, the more appropriate place to put guidance
+> would be on the "combiner" operation rather than on each use of it,
 > vaguely analogous to putting a pragma Inline on a subprogram rather than
 > on each call.
 
@@ -7689,7 +7689,7 @@
 sources, but Ada doesn't have that problem.
 
 I think a main point to consider is that parallel reduction is really quite a
-different algorithm than sequential reduction. Yes it is an optimization, but 
+different algorithm than sequential reduction. Yes it is an optimization, but
 it is also a change of algorithm, which may be worth indicating the specific
 places in the source code where this alternate algorithm is desired. The
 default usage I think would be 'Reduce, where the programmer wants the
@@ -7703,17 +7703,17 @@
 From: Erhard Ploedereder
 Sent: Sunday, February 18, 2018  4:22 PM
 
-> The main point of all of this is that in my view we don't want 
-> programmers deciding at the individual reduction expression whether to 
-> parallelize or not.  Based on my experience, there will be a lot of 
-> these scattered about, and you really don't want to get bogged down in 
-> deciding about 'Reduce vs. 'Parallel_Reduce each time you use one of 
-> these, nor going back and editing individual expressions to do your 
+> The main point of all of this is that in my view we don't want
+> programmers deciding at the individual reduction expression whether to
+> parallelize or not.  Based on my experience, there will be a lot of
+> these scattered about, and you really don't want to get bogged down in
+> deciding about 'Reduce vs. 'Parallel_Reduce each time you use one of
+> these, nor going back and editing individual expressions to do your
 > tuning.
 
 For >50 years, we as a community have tried to create compilers that
 parallelize sequential code based on "as if"-rules. We have not succeeded.
-"As if" is simply too limiting. What makes anybody believe that we as Ada 
+"As if" is simply too limiting. What makes anybody believe that we as Ada
 folks can succeed where others have failed?
 
 The reasonably successful models all based on "damn it, parallelize and
@@ -7731,13 +7731,13 @@
 From: Edmond Schonberg
 Sent: Sunday, February 18, 2018  5:35 PM
 
-Indeed, I don’t think Ada can bring anything that will suddenly make 
+Indeed, I don’t think Ada can bring anything that will suddenly make
 parallelization easy and effective. For a user concerned with performance,
 tuning machinery is indispensable:  that means profiling tools and
 annotations, which will invariably have to be target-dependent. The two
 best-known langage-independent (kind of) models of distribution and parallel
 computation in use today, OpenMP and OpenACC, both choose to use a pragma-like
-syntax to annotate a program that uses the standard syntax of a sequential 
+syntax to annotate a program that uses the standard syntax of a sequential
 language (Fortran, C, C++). This makes the inescapably iterative process of
 tuning a program easier, because only the annotations need to be modified.
 Those annotations typically carry target-specific information (number of
@@ -7747,12 +7747,12 @@
 the applicability of such a pragma the way it can warn on older optimization
 pragmas.
 
-From an implementation point of view there will be a big advantage in being 
-close to OpenMP and/or OpenACC given that several compiler frameworks (GCC, 
-LLVM) support these annotations. 
+From an implementation point of view there will be a big advantage in being
+close to OpenMP and/or OpenACC given that several compiler frameworks (GCC,
+LLVM) support these annotations.
 
-As an aside, something that the discussion on parallelism has omitted 
-completely so far is memory placement, and the literature on parallel 
+As an aside, something that the discussion on parallelism has omitted
+completely so far is memory placement, and the literature on parallel
 programming makes it clear that without a way of specifying where things go
 (which processor, which GPU, and when to transfer data from one to the other)
 there is no hope of getting the full performance that the hardware could
@@ -7764,25 +7764,25 @@
 From: Tucker Taft
 Sent: Sunday, February 18, 2018  6:41 PM
 
-> For >50 years, we as a community have tried to create compilers that 
-> parallelize sequential code based on "as if"-rules. We have not 
-> succeeded. "As if" is simply too limiting. What makes anybody believe 
+> For >50 years, we as a community have tried to create compilers that
+> parallelize sequential code based on "as if"-rules. We have not
+> succeeded. "As if" is simply too limiting. What makes anybody believe
 > that we as Ada folks can succeed where others have failed?
 
-I think a reduction expression might be a special case, since the entire 
+I think a reduction expression might be a special case, since the entire
 computation is fundamentally side-effect free.
 
-> The reasonably successful models all based on "damn it, parallelize 
-> and ignore the consequences vis-a-vis sequential code"-semantics. Of 
-> course, if for some reason you know that the parallel code is slower 
+> The reasonably successful models all based on "damn it, parallelize
+> and ignore the consequences vis-a-vis sequential code"-semantics. Of
+> course, if for some reason you know that the parallel code is slower
 > than the sequential version, then you can optimize by generating sequential code.
 > (Seriously.)
-> 
-> So, in short, I disagree. There is a need for a 'Parallel_Reduce, less 
-> to indicate that you wish parallelization, but rather to absolve from 
+>
+> So, in short, I disagree. There is a need for a 'Parallel_Reduce, less
+> to indicate that you wish parallelization, but rather to absolve from
 > a requirement of "as if"-semantics of parallelized code.
 
-But note (I believe) that our proposal is that 'Parallel_Reduce is illegal 
+But note (I believe) that our proposal is that 'Parallel_Reduce is illegal
 (not erroneous) if it is not data-race free, and depends on the Global
 annotations.  So I am not sure we are doing what you suggest.
 
@@ -7791,13 +7791,13 @@
 From: Randy Brukardt
 Sent: Monday, February 19, 2018  9:07 PM
 
-... 
-> > For >50 years, we as a community have tried to create compilers that 
-> > parallelize sequential code based on "as if"-rules. We have not 
-> > succeeded. "As if" is simply too limiting. What makes anybody 
+...
+> > For >50 years, we as a community have tried to create compilers that
+> > parallelize sequential code based on "as if"-rules. We have not
+> > succeeded. "As if" is simply too limiting. What makes anybody
 > > believe that we as Ada folks can succeed where others have failed?
-> 
-> I think a reduction expression might be a special case, since the 
+>
+> I think a reduction expression might be a special case, since the
 > entire computation is fundamentally side-effect free.
 
 Two problems with that:
@@ -7816,29 +7816,29 @@
 examples that Tucker provided are verging on "tricky" (using a reduce to
 create a debugging string, for example).
 
-> > The reasonably successful models all based on "damn it, parallelize 
-> > and ignore the consequences vis-a-vis sequential code"-semantics. Of 
-> > course, if for some reason you know that the parallel code is slower 
-> > than the sequential version, then you can optimize by generating 
+> > The reasonably successful models all based on "damn it, parallelize
+> > and ignore the consequences vis-a-vis sequential code"-semantics. Of
+> > course, if for some reason you know that the parallel code is slower
+> > than the sequential version, then you can optimize by generating
 > > sequential code. (Seriously.)
-> > 
-> > So, in short, I disagree. There is a need for a 'Parallel_Reduce, 
-> > less to indicate that you wish parallelization, but rather to 
+> >
+> > So, in short, I disagree. There is a need for a 'Parallel_Reduce,
+> > less to indicate that you wish parallelization, but rather to
 > > absolve from a requirement of "as if"-semantics of parallelized code.
-> 
-> But note (I believe) that our proposal is that 'Parallel_Reduce is 
-> illegal (not erroneous) if it is not data-race free, and depends on 
-> the Global annotations.  So I am not sure we are doing what you 
+>
+> But note (I believe) that our proposal is that 'Parallel_Reduce is
+> illegal (not erroneous) if it is not data-race free, and depends on
+> the Global annotations.  So I am not sure we are doing what you
 > suggest.
 
-Any Ada parallelism will require that. Otherwise, the code has to be executed 
-purely sequentially (Ada has always said that). One could imagine a switch to 
+Any Ada parallelism will require that. Otherwise, the code has to be executed
+purely sequentially (Ada has always said that). One could imagine a switch to
 allow everything to be executed in parallel, but that by definition couldn't
 follow the canonical Ada semantics.
 
 In my view, the only parallelism that is likely to be successful for Ada is
-that applied at a fairly high-level (much like one would do with Ada tasks 
-today). For that to work, the programmer has to identify where parallelism 
+that applied at a fairly high-level (much like one would do with Ada tasks
+today). For that to work, the programmer has to identify where parallelism
 can best be applied, with the compiler helping to show when there are problems
 with that application. Such places should be just a handful in any given
 program (ideally, just one).
@@ -7847,7 +7847,7 @@
 going to be rather disappointed, regardless of what rules we adopt. (Robert
 always used to say that "most optimizations are disappointing", and that
 surely is going to be the case here.) Real performance improvement is all
-about finding the "hottest" code and then replacing that with a better 
+about finding the "hottest" code and then replacing that with a better
 algorithm. Parallel execution has a place in that replacement, but it is not
 any sort of panacea.
 
@@ -7856,32 +7856,32 @@
 From: Erhard Ploedereder
 Sent: Tuesday, February 20, 2018  12:42 PM
 
->> For >50 years, we as a community have tried to create compilers that 
->> parallelize sequential code based on "as if"-rules. We have not 
->> succeeded. "As if" is simply too limiting. What makes anybody believe 
+>> For >50 years, we as a community have tried to create compilers that
+>> parallelize sequential code based on "as if"-rules. We have not
+>> succeeded. "As if" is simply too limiting. What makes anybody believe
 >> that we as Ada folks can succeed where others have failed?
 
-> I think a reduction expression might be a special case, since the 
+> I think a reduction expression might be a special case, since the
 > entire computation is fundamentally side-effect free.
 
 Then I misunderstood the purpose of 'Parallel_Reduce.
 I thought that the parallelization also applied to the production of the
-values to be reduced, e.g., applying the filters and evaluations of the 
+values to be reduced, e.g., applying the filters and evaluations of the
 container's content values.
- 
->> The reasonably successful models all based on "damn it, parallelize 
+
+>> The reasonably successful models all based on "damn it, parallelize
 >> and ignore the consequences vis-a-vis sequential code"-semantics.
->> Of course, if for some reason you know that the parallel code is 
->> slower than the sequential version, then you can optimize by 
+>> Of course, if for some reason you know that the parallel code is
+>> slower than the sequential version, then you can optimize by
 >> generating sequential code. (Seriously.)
->> 
->> So, in short, I disagree. There is a need for a 'Parallel_Reduce, 
->> less to indicate that you wish parallelization, but rather to absolve 
+>>
+>> So, in short, I disagree. There is a need for a 'Parallel_Reduce,
+>> less to indicate that you wish parallelization, but rather to absolve
 >> from a requirement of "as if"-semantics of parallelized code.
 
-> But note (I believe) that our proposal is that 'Parallel_Reduce is 
-> illegal (not erroneous) if it is not data-race free, and depends on 
-> the Global annotations.  So I am not sure we are doing what you 
+> But note (I believe) that our proposal is that 'Parallel_Reduce is
+> illegal (not erroneous) if it is not data-race free, and depends on
+> the Global annotations.  So I am not sure we are doing what you
 > suggest.
 
 Yes, I am afraid of that. So, any program that dares to use 'Parallel_Reduce
@@ -7907,19 +7907,19 @@
 optimizations. For the vast majority of programs, they're irrelevant, and the
 same is certainly true of fine-grained parallelism. As an as-if optimization,
 it is impossible even in the rare cases where it might help (because there is
-a substantial possibility that the code would be slower - the compiler knows 
+a substantial possibility that the code would be slower - the compiler knows
 little about the function bodies and little about the number of iterations).
 
-But we *have* to do this for marketing reasons. And it can help when it is 
+But we *have* to do this for marketing reasons. And it can help when it is
 fairly coarse-grained. But *no one* can find data races by hand. (I know I
 can't -- the web server goes catatonic periodically and I have been unable to
 find any reason why.) Making it easy to have parallel execution without any
 sort of checking is a guarantee of unreliable code.
 
-I thought that the intent was that the parallelism checking would be a 
-"suppressible error", such that one could disable it with a pragma. (That of 
-course assumes that we define the concept.) So if you really think you can get 
-it right on your own, feel free to do so. I just hope no one does that in any 
+I thought that the intent was that the parallelism checking would be a
+"suppressible error", such that one could disable it with a pragma. (That of
+course assumes that we define the concept.) So if you really think you can get
+it right on your own, feel free to do so. I just hope no one does that in any
 auto or aircraft software...
 
 ****************************************************************
@@ -7999,3 +7999,309 @@
 
 ****************************************************************
 
+From: Brad Moore
+Sent: Friday, May  4, 2018  5:42 PM
+
+I have been looking at the AI regarding compare and swap and lock free
+operations, and realised such operations can be another option for performing
+reductions in parallel, (which is also similar to using protected objects for
+such reductions).
+
+I have implemented a set of generic Ada packages that match the capabilities of
+GCC's atomic primitives.
+
+Ordinarily, updating atomic variables have higher expense for overhead than
+updating local variables, but if most of the calculations in a chunk of
+iterations can be applied to variables of non-atomically updated types, and then
+reductions of chunk results performed via updates using atomic or protected
+operations, then the overhead of the updates can be marginalised. Also, if the
+number of iterations is small enough, and the overhead of applying the reduction
+on every iteration is negligible compared to other processing in each iteration
+of the loop, it can be worthwhile to use atomic/lock free or protected
+operations for reduction.
+
+With this in mind, I thought it might be helpful to compare the various
+possibilities for loop reduction with the proposals we have currently in the
+context of producing multiple reduction results, which is an interesting case
+that might draw one to consider parallel loop syntax.
+
+Consider the case of iterating over an array to produce the Sum, Min, and Max of
+all values of the array.
+
+Using 'Reduce / 'Parallel_Reduce Syntax alone  (AI12-0242-1 Proposal), we could
+write:
+
+
+Sum := A'Parallel_Reduce("+", 0);
+Min := A'Parallel_Reduce(Integer'Min, Integer'Last);
+Max := A'Parallel_Reduce(Integer'Max, Integer'First);
+
+or alternatively, to do all three in one reduction operation....;
+
+type Summary is
+   record
+      Sum : Integer;
+      Min : Integer;
+      Max : Integer;
+   end record;
+
+-- Reducer function for the Reduction expression
+function Reduce (L, R : Summary) return Summary is
+  (Summary'(Sum => L + R,
+            Min => Integer'Min (L, R),
+            Max => Integer'Max (L, R)));
+
+-- Reduction expression to compute all 3 results at once
+Result : constant Summary := A'Parallel_Reduce(Reduce,
+                                               Summary'(Sum => 0,
+                                                        Min => Integer'Last,
+                                                        Max => Integer'First));
+
+
+If we include the capabilities of AI12-0190-1, Function expressions, this can be
+further simplified to;
+
+type Summary is
+   record
+      Sum : Integer;
+      Min : Integer;
+      Max : Integer;
+   end record;
+
+-- Reduction expression to compute all 3 results at once
+Result : constant Summary :=
+   A'Parallel_Reduce
+       ((function (L, R) return (L+R, Integer'Min(L,R), Integer'Max(L,R))),
+        Summary'(0, Integer'Last, Integer'First));
+
+Note: In the first case, we are iterating through the array 3 times, in the
+second (and third) cases, we only iterate once. Note also that in this
+particular example, performance testing showed the first case to complete
+quicker than the second, but your mileage can vary for different problems.
+Sometimes the second approach can provide better results.
+
+Moving on to parallel loops, we could consider just the capabilities of
+AI12-0119-1. With ordinary parallel loops, reduction is not supported per se,
+but can be accomplished, if the reductions occur for every iteration, and are
+stored via lock free/atomic operations (or via a protected object).
+
+-- Using my test implementation of atomic/lock free operations
+package Atomic_Integers is new Atomic_Operations.Signed_Arithmetic (Atomic_Type => Integer);
+
+parallel
+for I in Arr'Range loop
+   declare
+      Dont_Care : Integer;
+   begin
+      --  Lengthy complex processing here
+
+      Dont_Care := Atomic_Integers.Atomic_Add_And_Fetch (Item => Sum,
+                                                         Value => A(I));
+      Dont_Care := Atomic_Integers.Apply_Lock_Free_Operation (Item    => Min,
+                                                              Process => Integer'Min'Access,
+                                                              Value   => A(I));
+      Dont_Care := Atomic_Integers.Apply_Lock_Free_Operation (Item    => Max,
+                                                              Process => Integer'Max'Access,
+                                                              Value   => A(I));
+   end;
+end loop
+
+or alternatively, one might create a more customised PO or ADT for this, which
+also reads better, but wont be lock-free unless one takes advantage of
+non-standard Lock_Free aspect compiler extensions.
+
+parallel
+for I in Arr'Range loop
+   declare
+      Dont_Care : Integer;
+   begin
+      --  Lengthy complex processing here
+
+      PO.Apply(A(I));  -- Updates the Max, Min, and Sum values via a PO.
+   end;
+end loop
+
+This should work but likely only provides good performance if the number of
+iterations is small and the overhead for updating the atomic reduction variables
+is small compared to other processing occurring during each iteration of the
+loop.
+
+Otherwise, if the number of iterations is high, and the iteration itself is a
+significant part of the processing, then per iteration reduction can be too
+costly to consider, but some of the other additional proposals that provide loop
+chunking capabilities, can be used.
+
+First consider writing such a parallel Loop via AI12-0251-1 Proposal, where
+special loop chunking syntax is provided:
+
+declare
+     Partial_Sum : array (1 .. 2*Num_CPUs) of Integer := (others => 0);
+     Partial_Min : array (1 .. 2*Num_CPUs) of Integer := (others => 0);
+     Partial_Max : array (1 .. 2*Num_CPUs) of Integer := (others => 0);
+begin
+     parallel (Chunk in Partial_Sum'Range) for I in Arr'Range loop
+        Partial_Sum(Chunk) := @ + Arr(I);
+        Partial_Min(Chunk) := Integer'Min(@, Arr(I));
+        Partial_Max(Chunk) := Integer'Max(@, Arr(I));
+     end loop;
+
+     Sum := Partial_Sum'Reduce("+", 0);
+     Min := Partial_Min'Reduce(Integer'Min, Integer'Last);
+     Max := Partial_Max'Reduce(Integer'Max, Integer'First);
+end;
+
+This proposal pretty much requires the use of arrays for storing partial results
+(and encourages the use of 'Reduce to provide the reductions). Reason: The
+syntax does not allow for per-chunk variable declarations. Any declarations
+inside the loop are per-iteration declarations, so the use of per-chunk local
+variables for reduction do not apply to this case. The use of array syntax will
+be familiar to existing Ada practitioners, but there may be issues in terms of
+stack storage needed for the arrays particularly if the size of the components
+is large, and some possible confusion or difficulty for the programmer in
+deciding how big to make the arrays. Note, the bounds of the three arrays would
+need to be identical for this to make sense and work. There is an arguable
+advantage that the syntax presents this problem as a single non-nested loop.
+That's actually an advantage and a disadvantage. It's an advantage because it
+looks simpler, but a disadvantage because one cannot make per-chunk
+declarations, or perform per-chunk initialisation or finalisation.
+
+Another point to note is that syntactically, the reductions occur sequentially
+after the parallel loop. Ideally, one might like to see the reductions occur
+during the parallel loop processing, but in this case, it wouldn't make much of
+a difference. The reduction processing is likely insignificant compared to the
+processing of the parallel loop, which is probably more typical. Note that
+'Reduce was used here rather than 'Parallel_Reduce. This is because there is
+likely not be be any benefit to doing the 'Reduce in parallel, and may likely be
+worse than sequential, as the overhead of parallelism is not overcome by the
+other processing of the 'Reduce attribute.
+
+Moving on to the other alternate chunking proposal. This proposal does not
+define any new syntax, but instead provides a chunking library that can break up
+a set of iterations into an array of chunks, which is a proposal likely also
+needed for container iteration.
+
+One could write this Parallel Loop via the AI12-0251-2 Proposal
+as;
+
+declare
+   subtype Loop_Index is Integer range 1 .. A'Length;
+   package Manual_Chunking is new Ada.Discrete_Chunking (Loop_Index);
+
+   Chunks : constant
+      Manual_Chunking.Chunk_Array_Iterator_Interfaces.Chunk_Array'Class
+         := Manual_Chunking.Split;
+begin
+   parallel
+   for Chunk in 1 .. Chunks.Length loop
+      declare
+         Partial_Sum : Integer := 0;
+         Partial_Min : Integer := Integer'Last;
+         Partial_Max : Integer := Integer'First;
+      begin
+
+         for I in Chunks.Start(Chunk) .. Chunks.Finish(Chunk) loop
+            Partial_Sum := @ + A(I);
+            Partial_Min := Integer'Min(@, A(I));
+            Partial_Max := Integer'Max(@, A(I));
+         end loop;
+
+         -- Updates the Max, Min, and Sum values via a PO.
+         PO.Apply(Partial_Sum, Partial_Min, Partial_Max);
+      end;
+   end loop;
+end;
+
+In this case, we have the option of declaring partial arrays and writing the
+code similarly as the chunking syntax proposal above, but we also have the
+option of just declaring local per-chunk variables, and then performing the
+reductions in parallel, as part of the processing of each chunk, using updates
+to atomic variables. Note the reduction occurs in parallel, rather than
+sequentially after the parallel loop. A disadvantage might be that the loop
+appears more complex, as it involves a nested loop. An advantage might be that
+this is more closely resembles the actual processing that is occurring, and thus
+involves less magic. This also allows for the possibility of having pre-chunk
+initialisation code, and post-chunk finalisation code, which can be useful for
+certain problems. It also allows one to query query the chunk boundaries, which
+the syntax based proposal does not provide currently. Such capability could be
+added to that proposal, if it is needed, however.
+
+Finally, consider the possibilities of using a 3rd party parallelism library,
+such as Paraffin. When combined with the anonymous loop body proposal AI, it is
+interesting to note that the code ends up looking very much like the previous
+example.
+
+
+Using a parallelism library such as Paraffin, with anonymous loop body proposal
+(AI12-0189-1), one could write:
+
+declare
+  subtype Loop_Index is Integer range 1 .. A'Length;
+
+  package Parallel_Loops is new Paraffin.Dynamic (Loop_Index);
+  use Parallel_Loops;
+begin
+   for (Start, Finish) of Parallel_Iterate loop
+      declare
+         Partial_Sum : Integer := 0;
+         Partial_Min : Integer := Integer'Last;
+         Partial_Max : Integer := Integer'First;
+      begin
+
+         for I in Start .. Finish loop
+            Partial_Sum := @ + A(I);
+            Partial_Min := Integer'Min(@, A(I));
+            Partial_Max := Integer'Max(@, A(I));
+         end loop;
+
+         -- Updates the Max, Min, and Sum values via a PO.
+         PO.Apply(Partial_Sum, Partial_Min, Partial_Max);
+      end;
+   end loop;
+end;
+
+This example has pretty much the same advantages and disadvantages of the
+previous proposal, except that this relies on non-standard libraries, whereas
+the previous example does not. It might be less obvious that the Start and
+Finish variables are the Chunk start and finish iteration values, as those are
+specific parameters to the anonymous subprogram that is passed to the
+Parallel_Iterate call. Another disadvantage is that the compiler likely knows
+less about the parallelism of this loop, which may mean there are certain usage
+problems that may be more difficult for the compiler to detect.
+
+It is also interesting that the special loop syntax in the for ... loop line
+looks quite similar to the corresponding syntax in the syntax based chunking
+proposal.
+
+What conclusions can be drawn from all this? I'm not sure. We probably still
+need to choose between AI12-0251-1, and AI12-0251-2 (or neither if the anonymous
+loop body approach holds enough appeal).
+
+We seem to have several ways to do parallel reduction, each of which might
+appeal to different people for different problems. I think that's a good thing,
+one can write very concise expressions when needed, or apply more explicit
+approaches when desired.
+
+I still like the idea of being able to specify that a PO be lock-free, possibly
+in addition to providing a set of primitives to map to compare and swap
+operations et al, though it appears that there is not much support for that
+idea. One should be able to create similar abstractions to PO's on top of the
+set of atomic/lock free primitives. One worry would be that for parallelism at
+least, there might be the tendency to move away from the use of protected
+objects, in favour of user-provided lock free packages or abstractions, if it is
+perceived they have better performance. Maybe that's a good thing?
+
+One parallelism capability that we do not have, that was recently added to
+OpenMP, and was discussed at the recent IRTAW, was unstructured parallelism.
+This the case where the parallelism structures are not as closely tied to
+fork-join semantics. One might create tasklets at various points during
+processing, but then specify independent synchronisation points in the code
+where sets of related tasklets need to complete their processing.
+
+I think this is a capability we'd likely want to explore at some point, but
+likely that is something that would be best to leave to Ada 2025. ;-)  For Ada
+202x, its probably best to focus on what we already have on the table, and
+provide some basic parallelism capabilities that cover a broad set of the
+parallelism spectrum, and then look at more esoteric (but potentially very
+useful) features or capabilities further down the road.
+
+*****************************************************************

Questions? Ask the ACAA Technical Agent