Version 1.5 of ai12s/ai1203481.txt
!standard 4.5.10(0) 200128 AI1203481/03
!class Amendment 200108
!status Amendment 12012 200115
!status ARG Approved 904 200115
!status work item 200108
!status received 191013
!priority Medium
!difficulty Easy
!subject Remove Combiners from Reduction Expressions
!summary
Combiners are removed from reduction expressions. As a consequence, the
Accum_Type and the Value_Type must be the same for parallel reductions.
!problem
Combiners seem necessary mostly for contrived uses of reduction expressions,
and add substantial complexity to the model and resolution of reduction
expressions. We should eliminate them.
!proposal
(See Summary.)
!wording
(See !corrigendum.)
[Editor's note: Simplifying only the formatted version saves me from having
to construct two parallel versions of this wording. Note that this wording
is the entire subclause; many parts are unchanged, but enough change that
it is easier to just repeat the entire thing.]
!corrigendum 4.5.10(0)
Insert new clause:
Reduction expressions provide a way to map or transform a collection of
values into a new set of values, and then summarize the values produced by
applying an operation to reduce the set to a single value result.
A reduction expression is represented as an attribute_reference of the
reduction attributes Reduce or Parallel_Reduce.
Syntax
reduction_attribute_reference ::=
value_sequence'reduction_attribute_designator
 prefix'reduction_attribute_designator
value_sequence ::=
'[' [parallel[(chunk_specification)]] iterated_element_association ']'
reduction_attribute_designator ::= reduction_identifier(reduction_specification)
reduction_specification ::= reducer_name, initial_value_expression
The iterated_element_association of a value_sequence shall not have
a key_expression, nor shall it have a
loop_parameter_specification that has the reserved word reverse.
The chunk_specification, if any, of a value_sequence shall be an
integer_simple_expression.
Name Resolution Rules
The expected type for a reduction_attribute_reference shall be a single
nonlimited type.
In the remainder of this subclause,
we will refer to nonlimited subtypes Value_Type and Accum_Type of a
reduction_attribute_reference. These subtypes and interpretations of
the names and expressions of a reduction_attribute_reference
are determined by the following rules:
 Accum_Type is a subtype of the expected type of
the reduction_attribute_reference.
 A reducer subprogram is either subtype conformant with the following
specification:
function Reducer(Accumulator : Accum_Type; Value : Value_Type) return Accum_Type;
or is subtype conformant with the following specification:
procedure Reducer(Accumulator : in out Accum_Type; Value : in Value_Type);
 The reducer_name of a reduction_specification denotes a
reducer subprogram.
 The expected type of an initial_value_expression of a
reduction_specification is that of subtype Accum_Type.
 The expected type of the expression of the
iterated_element_association of a value_sequence
is that of subtype Value_Type.
Legality Rules
If the reduction_attribute_reference has a value_sequence with the
reserved word parallel, the subtypes Accum_Type and Value_Type
shall statically match.
If the identifier of a reduction_attribute_designator is
Parallel_Reduce, the subtypes Accum_Type and Value_Type shall
statically match.
Static Semantics
A reduction_attribute_reference denotes a value, with nominal subtype
being the subtype of the first parameter of the subprogram denoted by the
reducer_name.
Dynamic Semantics
For the evaluation of a value_sequence, the
iterated_element_association is elaborated, then an iteration is
performed, and for each value conditionally produced by the iteration (see
5.5 and 5.5.2), the associated expression is evaluated with the loop
parameter having this value, to produce a result that is converted to
Value_Type, and used to define the next value in the sequence.
If the value_sequence does not have the reserved word parallel, it is
produced as a single sequence of values by a single logical thread of
control. If the reserved word parallel is present in the value_sequence,
the enclosing reduction_attribute_reference is a parallel construct, and
the sequence of values is generated by a parallel iteration (as
defined in 5.5, 5.5.1, and 5.5.2), as a set of nonempty, nonoverlapping
contiguous chunks (subsequences) with one logical thread of control (see
clause 9) associated with each subsequence. If there is a
chunk_specification, it determines the maximum number of chunks, as
defined in 5.5; otherwise the maximum number of chunks is implementation defined.
For a value_sequence V, the following attribute is defined:
 V'Reduce(Reducer, Initial_Value)

This attribute represents a reduction expression, and is in the
form of a reduction_attribute_reference.
The evaluation of a use of this attribute begins by evaluating the
parts of the reduction_attribute_designator (the reducer_name
Reducer and the initial_value_expression Initial_Value), in an
arbitrary order. It then
initializes the accumulator of the reduction expression to the value
of the initial_value_expression (the initial value). The
value_sequence V is then evaluated.
If the value_sequence does not have the reserved word
parallel, each value of the value_sequence is passed, in order,
as the second (Value) parameter to a call on Reducer, with the first
(Accumulator) parameter being the prior value of the accumulator,
saving the result as the new value of the accumulator. The reduction
expression yields the final value of the accumulator.
If the reserved word parallel is present in a
value_sequence, then
the (parallel) reduction expression is a parallel construct and the
sequence has been partitioned into one or more subsequences (see
above) each with its own separate logical thread of control.
Each logical thread of control creates a local accumulator for
processing its subsequence. The accumulator for a subsequence is initialized
to the first value of the subsequence, and calls on Reducer start with the
second value of the subsequence (if any). The result for the
subsequence is the final value of its local accumulator.
After all logical threads of control of a parallel reduction
expression have completed, Reducer is called for each
subsequence, in the original sequence order, passing the local
accumulator for that subsequence as the second (Value) parameter, and
the overall accumulator (initialized above to the initial
value) as the first (Accumulator) parameter, with the result saved
back in the overall accumulator. The parallel reduction expression
yields the final value of the overall accumulator.
If the evaluation of the value_sequence yields an empty
sequence of values, the reduction expression yields the initial value.
If an exception is propagated by one of the calls on Reducer,
that exception is propagated from the reduction expression. If different
exceptions are propagated in different logical threads of control, one is
chosen arbitrarily to be
propagated from the reduction expression as a whole.
For a prefix X of an
array type (after any implicit dereference), or that denotes an iterable container
object (see 5.5.1), the following attributes are defined:
 X'Reduce(Reducer, Initial_Value)
 X'Reduce
is a reduction expression that yields a result equivalent to
replacing the prefix of the attribute with the value_sequence:
[for Item of X => Item]
 X'Parallel_Reduce(Reducer, Initial_Value)
 X'Parallel_Reduce
is a reduction expression that yields a result
equivalent to replacing the attribute identifier with
Reduce and the prefix of the attribute with the
value_sequence:
[parallel for Item of X => Item]
Bounded (RunTime) Errors
For a parallel reduction expression, it is a bounded error if the reducer
subprogram is not associative. That is, for any arbitrary values of subtype
Value_Type A, B, C and a reducer function R, the result of
R (A, R (B, C)) should produce a result equal to
R (R (A, B), C)). The possible consequences
are Program_Error, or a result that does not match the equivalent sequential
reduction expression due to the order of calls on the reducer subprogram being
unspecified in the overall reduction. Analogous rules apply in the case of a
reduction procedure.
Examples
An expression function that returns its result as a Reduction Expression:
function Factorial(N : Natural) return Natural is
([for J in 1..N => J]'Reduce("*", 1));
An expression function that computes the Sine of X using a Taylor expansion:
function Sine (X : Float; Num_Terms : Positive := 5) return Float is
([for I in 1..Num_Terms => (1.0)**(I1) * X**(2*I1)/Float(Factorial(2*I1))]
'Reduce("+", 0.0));
A reduction expression that outputs the sum of squares:
Put_Line ("Sum of Squares is" &
Integer'Image([for I in 1 .. 10 => I**2]'Reduce("+", 0));
An expression function to compute the value of Pi:
 See 3.5.7.
function Pi (Number_Of_Steps : Natural := 10_000) return Real is
(1.0 / Number_Of_Steps *
[for I in 1 .. Number_Of_Steps =>
(4.0 / (1.0 + ((Real (I)  0.5) * (1.0 / Number_Of_Steps))**2))]
'Reduce("+", 0.0));
Calculate the sum of elements of an array of integers:
A'Reduce("+",0)  See 4.3.3.
Determine if all elements in a twodimensional array of booleans are
set to true:
Grid'Reduce("and", True)  See 3.6.
Calculate the minimum value of an array of integers in parallel:
A'Parallel_Reduce(Integer'Min, Integer'Last)
A parallel reduction expression used to calculate the mean of the
elements of a twodimensional array of subtype Matrix (see 3.6) that
are greater than 100.0:
type Accumulator is record
Sum : Real;  See 3.5.7.
Count : Integer;
end record;
function Accumulate (L, R : Accumulator) return Accumulator is
(Sum => L.Sum + R.Sum,
Count => L.Count + R.Count);
function Average_of_Values_Greater_Than_100 (M : Matrix) return Real is
(declare
Acc : constant Accumulator :=
[parallel for Val of M when Val > 100.0 => (Val, 1)]
'Reduce(Accumulate, (Sum => 0, Count => 0));
begin
Acc.Sum / Real(Acc.Count));
!discussion
Examples of the use of combiners seem mainly to be examples that don't
benefit from parallel execution (but combiners are only necessary for
parallel execution) or are situations that would be better written as an
aggregate with an iterator. We propose that aggregates can also specify
parallel execution (see AI1203491), so combiners are almost completely
redundant and we can eliminate them without losing significant functionality.

Once all of the rules specific to combiners are removed, it became apparent
that there was no rule requiring an associative reducer subprogram for
parallel reduction. That is necessary in order to get a consistent and
portable result from a reduction, as the actual number of chunks used and
the split of elements/components are both unspecified. (The maximum number
of chunks can be specified, but the implementation doesn't have to use that
number. There are restrictions on allowed splits, but not enough to determine
the number of elements in each chunk.)
We do that with a Bounded Error similar to the original error.
For example, the following reduction is welldefined:
Str : String := ...
type Hash_Type is mod <someprimenumber>;
function Hash (Accum, Value : in Hash_Type) return Hash_Type is
(2*Accum + Value);
... := [for I in Str'range => Character'Pos(Str(I))]'Reduce (Hash, 0);
The reduction here creates a hash value for Str. But the similar parallel
reduction triggers the Bounded Error:
... := [parallel for I in Str'range => Character'Pos(Str(I))]'Reduce (Hash, 0);
This makes the result (ahem) hash, since Hash is not associative:
Hash (A, Hash (B, C)) = 2A + 2B + C  Unintended result!
Hash (Hash (A, B), C) = 4A + 2B + C  Intended result.
!ASIS
Whatever is needed is covered by the original AIs; if anything, a combiner
parameter can be dropped from something.
!ACATS test
We need a BTest to check the new rule that the Accum_Type and Value_Type
are the same for parallel iterators. Otherwise, tests for the original AIs
should cover everything (and tests of combiners can be dropped).
!appendix
From: Brad Moore
Sent: Sunday, October 13, 2019 11:41 AM
At the Lexington meeting earlier this month, we decided to carve out the
example of a Reduction expression involving a combiner function out of
AI1203121, since we had not yet come up with a good example.
We agreed that once we had such an example, that could be handled in a
separate AI.
While I haven't yet come up with an example, I have thought of another that
would be good to add.
In C++, there are examples of using variadic template syntax to create
functions to add an arbitrary list of numbers.
I was thinking about how best to do something like this in Ada.
I believe an array aggregate in combination with a reduction expression would
work. Something like;
Append after 4.2.1 (9.d/5)
 A reduction expression that computes the sum of an array aggregate
Total_Grocery_Bill : constant Money :=
(19.99, 1.49, 5.36, 75.66)'Reduce("+", 0);  See 3.5.9
We don't have an example showing how an array aggregate can be used in
combination with a reduction expression.
This uses round braces of an array aggregate rather than the square braces of
a value sequence, which also would be a good thing to show.
****************************************************************
From: Tucker Taft
Sent: Sunday, October 13, 2019 12:03 PM
> We don't have an example showing how an array aggregate can be used in
> combination with a reduction expression.
I don't think this is legal as written. If it is not a value sequence, then
I think the aggregate needs to be part of a qualified expression to be legal
as a "prefix" of an attribute. See latest Ada 202X RM, 4.5.10(2/5) and
4.1(2/5, 4).
> This uses round braces of an array aggregate rather than the square
> braces of a value sequence, which also would be a good thing to show.
I am not sure, but I thought we agreed to use '[' ... ']' for most array
aggregate examples. And I think here in particular it might be just confusing
to use parentheses, since I think we are trying to make these two uses of
'Reduce look as similar as possible, so the user doesn't have to worry about
the subtle differences between them, and instead can focus on the commonality.
****************************************************************
From: Brad Moore
Sent: Sunday, October 13, 2019 12:13 PM
Also, while on the topic of reduction examples, I was experimenting in
getting the To_Roman_Number function example of AI120312, which converts a
string of Roman characters into a decimal value, to be proven in SPARK. I did
manage to get something that SPARK proves. Though my implementation of the
function is different than the one shown in AI120312.
In particular, I was interested in creating a precondition that only allows
what I consider as valid sequences of Roman Digits. (e.g. No more than 3
characters of any kind in a row)
It turns out that different approaches to create such a validation can end up
being quite complex. I could have written an Is_Valid function and hidden
complex rules in there, but I was hoping to create an expression that shows
the rules in the contract rather than hidden in the implementation.
I did manage to do this.
I believe we could replace the precondition we currently have in the example
with:
type Roman_Thousands is (M, MM, MMM) with Ghost;
type Roman_Hundreds is (C, CC, CCC, CD, D, DC, DCC, DCCC, CM) with Ghost;
type Roman_Tens is (X, XX, XXX, XL, L, LX, LXX, LXXX, XC) with Ghost;
type Roman_Ones is (I, II, III, IV, V, VI, VII, VIII, IX) with Ghost;
function To_Roman_Number (Roman : Roman_String) return Roman_Number
with
Pre =>
(for some I in Roman_Thousands'Range =>
(I'Image = Roman or else
(for some J in Roman_Hundreds'Range =>
J'Image = Roman or else I'Image & J'Image = Roman
or else
(for some K in Roman_Tens'Range =>
K'Image = Roman or else I'Image & K'Image = Roman
or else J'Image & K'Image = Roman or else
I'Image & J'Image & K'Image = Roman or else
(for some L in Roman_Ones'Range =>
L'Image = Roman or else I'Image & L'Image = Roman
or else J'Image & L'Image = Roman or else
K'Image & L'Image = Roman or else
I'Image & J'Image & L'Image = Roman
or else I'Image & K'Image & L'Image = Roman
or else I'Image & J'Image & K'Image & L'Image = Roman
or else J'Image & K'Image & L'Image = Roman)))));
It requires that the input string must have at least one match with one of the
images of the enumerations above, and also enforces all the rules one would
expect for Roman numerals. (e.g. The thousands comes before hundreds, which
comes before tens, etc)
However, I am not suggesting we should necessarily update the example in the
RM with this, unless others thought is was worthwhile. It may be considered
to be too complex for a simple example. On the otherhand, it is a precise
precondition contract for the function.
I did find this to be an interesting exercise, none the less.
****************************************************************
From: Brad Moore
Sent: Sunday, October 13, 2019 12:42 PM
Thinking more about this, I think it would be nice to show this in the RM, as
an alternative to what we have now, instead of as a replacement, perhaps as
a note?
It demonstrates that contracts can have varying preciseness, and also, this
could be an example of the ghost aspect, which we not have any examples for
yet.
****************************************************************
From: Brad Moore
Sent: Sunday, October 13, 2019 12:55 PM
> I am not sure, but I thought we agreed to use '[' ... ']' for most
> array aggregate examples. And I think here in particular it might be
> just confusing to use parentheses, since I think we are trying to make
> these two uses of 'Reduce look as similar as possible, so the user
> doesn't have to worry about the subtle differences between them, and instead
> can focus on the commonality.
Thanks Tucker for reminding me. This gives all the more reason for showing
an example of how to do this in the RM.
So instead, I would suggest;
Append after 4.2.1 (9.d/5)
 A reduction expression that computes the sum of an array aggregate
type Expenses is array (Positive range <>) of Money;  See 3.5.9
Total_Grocery_Bill : constant Money
:= Expenses'[19.99, 1.49, 5.36, 75.66]'Reduce("+", 0);
****************************************************************
From: Brad Moore
Sent: Tuesday, October 15, 2019 9:24 PM
I am definitely having second thoughts about this example, because why not
simplify this further and just write an expression with all the "+"s, and not
bother with 'Reduce? i.e. Nobody would write this.
So I thought I'd better come up with a better example of an
aggregate/reduction expression combination. It turns out that coming up with a
good example using aggregate syntax with 'Reduce is almost as difficult as
coming up with one using a combiner.
A better example would be to use a reduction that is not an operator, such as
'Min or 'Max, and probably better to involve variables where the values are not known.
I think the following would work a lot better as an example.
type Expenses is array (Positive range <>) of Money;  See 3.5.9
 A reduction expression that computes the lowest estimated cost for a repair
 using an array aggregate
function Cheapest_Estimate (First_Estimate, Second_Estimate,
Third_Estimate : Money) is
(Expenses'[First_Estimate, Second_Estimate, Third_Estimate]'Reduce(Money'Min, Money'Last));
For a function like this, if there were more parameters, then one would likely
want to use an array parameter instead, but then this would not be an example
involving aggregate syntax.
****************************************************************
From: JeanPierre Rosen
Sent: Wednesday, October 16, 2019 12:10 AM
> I am definitely having second thoughts about this example, because why
> not simplify this further and just write an expression with all the
> "+"s, and not bother with 'Reduce? i.e. Nobody would write this.
Agreed!
...
>function Cheapest_Estimate (First_Estimate, Second_Estimate,
> Third_Estimate : Money) is
> (Expenses'[First_Estimate, Second_Estimate, Third_Estimate]'Reduce(Money'Min, Money'Last));
I'm afraid nobody would write this either. Honestly, if you can spell
out all the values of an aggregate, it is simpler to write the reduction
explicitely rather than using 'Reduce. Or you need really huge
aggregates (as happens sometimes with state machines), but that would be
too much for RM example...
****************************************************************
From: Brad Moore
Sent: Wednesday, October 16, 2019 10:37 AM
Or rather nobody would write this once they realized there was a better way. ;)
I originally was comparing the reduction expression to writing an if
expression without the use of 'Max, and the reduction expression is quite
a bit simpler than that.
But if you write the expression function as two calls to 'Max, where the
second call is nested in the first, I agree that beats using a reduction
expression.
So, I think we can conclude from this that the use of aggregate syntax in
a reduction expression is not a useful combination, and that no such example
should be included in the RM.
****************************************************************
From: Edmond Schonberg
Sent: Sunday, October 13, 2019 5:50 PM
> At the Lexington meeting earlier this month, we decided to carve out
> the example of a Reduction expression involving a combiner function out
> of AI1203121, since we had not yet come up with a good example.
...
I have also been unable to find a convincing example of use of a combiner.
I can;’t find any algorithms that would benefit from this additional
complication, and would suggest that we leave it out of Reduce, All examples
that use reduction parallelize if the operation has the proper associativity,
and the combiner adds nothing except description and implementation
.complexity.
****************************************************************
From: Tucker Taft
Sent: Sunday, October 13, 2019 8:07 PM
I'd like to give it one more try. I am pretty sure other reduction primitives
allow for two distinct operations. I'll see if I can find examples associated
with those. The classic example is where you are concatenating characters into
a string, or elements into a set, where there is a distinction between adding
an element to a container, and combining two containers. There can be
significant overhead in first turning an element into a singleton container,
so it is nice to be able skip over that step.
****************************************************************
From: Jeffrey Cousins
Sent: Monday, October 14, 2019 2:57 AM
> would suggest that we leave it out of Reduce, All examples that use reduction
Ed, just to clarify, do you mean leave it out of Reduce, or just leave it out
of the example of Reduce?
****************************************************************
From: Edmond Schonberg
Sent: Monday, October 14, 2019 6:42 AM
I mean leave it out of Reduce. It seems curious to propose and new feature
and then have to look long and hard to find a possible use for it. The
concatenation example is an unusual idiom that is only remotely connected to
reduction and is not enough to justify the added complexity.
****************************************************************
From: Tucker Taft
Sent: Monday, October 14, 2019 1:34 PM
OK, I did a bit of research. OpenMP allows userdefined reductions with a
separate "combiner" operation. One example that popped out is that when you
are summing up a large array, it is quite common that the sum needs more range
than the individual elements of the array. So the "combiner" might be
operating on long integers, or doubleprecision floats, while the individual
elements are shorter integers or singleprecision floats. On the other hand,
you generally need to convert each element before adding it into the ongoing
total, so unless you have a special operation that combines a conversion and
an add, there is no special requirement for a separate operation, though
clearly a pure arrayoriented "reduce" without any ability to convert the
value first to a larger type would be a problem. However, for the Reduce
operation we have proposed that operates on a value sequence rather than an
array, we provide the opportunity to specify a conversion (or some more
general "map" operation as part of the "map/reduce" paradigm), so a
conversion by itself does not seem enough to justify the additional operation.
I am going to look at examples for OpenMP "declare reduction"  so far they
deal with the case that the operation is not one of the operators that OpenMP
has builtin knowledge of, such as what is the "identity" of the operator,
etc. We already account for that situation. For what it is worth, here is
Michael Klemm's presentation on OpenMP 4.0, where the "declare reduction" was
introduced:
https://www.hpc2n.umu.se/sites/default/files/2.04%20Outlook%20on%20OpenMP%204.pdf
His example is on slide 11.
More to come!
****************************************************************
From: Tucker Taft
Sent: Monday, October 14, 2019 6:55 PM
After more research, I am coming around to Ed's view. I think given the
"rollyourown" reduction using a chunk parameter, the fact that the
element/container examples are best handled with a container aggregate, and
similarly the character/string example can be handled with an array aggregate
based on an iterator, we have enough to make the "leap" to eliminate the
option where the reducer and the combiner would be different. So in the
simplified version we would only have a reducer, and it would be required to
have the same type as both inputs and as output. We could keep the
reducer/combiner option for a future standard if we get a huge groundswell
for it... ;)
But it is worth remembering that the container aggregate, generalized array
aggregate, and the chunkparameter mechanism are what make this unnecessary,
so those are providing important value.
****************************************************************
From: Brad Moore
Sent: Tuesday, October 15, 2019 9:03 PM
I think it is important to also remember that the combiner is only needed for
parallelism. Without it, the reducer still can work, but would need to be
sequential if the parameters of the reducer are of different types.
That is, I don't see that container aggregate and generalized array aggregate
syntax replace use cases for combiners, in that the parallel keyword is not
allowed currently in aggregate syntax, and so in theory would
not facilitate parallelism in cases where a combiner would be needed.
For parallelism, though, based on previous examples, the combiner parameter
was pretty much only needed for reductions involving concatenation (eg. for
arrays or vector containers), or similarly concatenationlike operations such
populating a container by appending items (e.g. to a linked list, vector, or
into a set container).
For something like concatenating letters into a string, it is difficult to
imagine an example where parallelism would be worthwhile. It could be, if the
amount of processing needed to generate each letter was significant, but not
likely to be a real example.
Concatenating matrices into a vector is likely a more realistic example that
could benefit from parallelism. But then it seems unlikely to me that one
would be manufacturing matrices out of thin air, and using concatenationlike
operations.
Rather, it seems more likely that there would already be an array of data to
be used for generating the vector of matrices. (eg. Perhaps multiplying
vectors together to produce matrices?)
That is, the number of elements of the resulting array of matrices is known
prior to processing, and could be used to size the result before the
processing begins. In that case, as Tucker suggests, I believe a chunked
parallel loop could be used with a Vector container to place each computed
matrix into the correct slot of the result.
So I think I agree that while there may be some cases where a combiner could
be useful, it does seem that the possible uses are few and far between in
practice, and probably not worth the additional complexity at this time,
unless as Tucker says, there is a huge groundswell and demand in the future
to support cases that have so far been eluding us.
In addition to the rollyourown chunking loop syntax, I think the "map" part
of the map/reduce syntax for 'Reduce also helps to eliminate the need for
combiners, in cases where the map can convert a value into a matching type for
a reducer without introducing too much overhead.
****************************************************************
From: Randy Brukardt
Sent: Tuesday, November 26, 2019 8:19 PM
>>Ed, just to clarify, do you mean leave it out of Reduce, or just leave it
>>out of the example of Reduce?
>I mean leave it out of Reduce. It seems curious to propose and new feature
>and then have to look long and hard to find a possible use for it. The
>concatenation example is an unusual idiom that is only remotely connected
>to reduction and is not enough to justify the added complexity.
Tucker eventually replied:
>After more research, I am coming around to Ed's view.
Brad eventually concurred as well.
Since this is "just" deleting some text, I'm happy to take a swing at
writing this up formally. And it would be valuable to see how much simpler
the wording would be without it. However, it's going to take several hours
to do (this is a large section, and other simplifications to the wording are
possible once the combiner stuff is gone). So I wanted to verify that we are
proposing to remove all vestiges of the combiner from the wording before
spending the time.
A second question: are we going to keep the possibility for a *sequential*
reduce to have operands of different types? I don't see much reason to
remove it (it probably simplifies the wording slightly [mainly getting rid
of a Legality Rule for parallel reduces], but it doesn't seem to have much
impact on the implementation which is the same either way). Probably the
only thing it complicates is resolution, and that is a mess with or without
the extra capability.
Thoughts?
****************************************************************
From: Brad Moore
Sent: Tuesday, November 26, 2019 11:02 PM
There is one issue that likely involves adding some text, however.
The thought was that array aggregates were a better way to write
concatenation, and would be a better way to express parallelism for such a
problem.
However, currently array aggregates do not allow the parallel keyword.
If we could allow that, then I think it is a nobrainer for removing the
combiner from reduce. However, without that, we would then be losing the
ability to do explicit parallel concatenation expressions. So, should we allow
"parallel" on array aggregates when they involve an
iterated_component_association?
> A second question: are we going to keep the possibility for a
> *sequential* reduce to have operands of different types? I don't see
> much reason to remove it (it probably simplifies the wording slightly
> [mainly getting rid of a Legality Rule for parallel reduces], but it
> doesn't seem to have much impact on the implementation which is the
> same either way). Probably the only thing it complicates is
> resolution, and that is a mess with or without the extra capability.
>
> Thoughts?
I would think that it is probably good to leave support for the *sequential*
case, unless someone comes up with good reason to disallow that.
****************************************************************
From: Randy Brukardt
Sent: Tuesday, November 26, 2019 11:15 PM
> However, currently array aggregates do not allow the parallel keyword.
> If we could allow that, then I think it is a nobrainer for removing
> the combiner from reduce. However, without that, we would then be
> losing the ability to do explicit parallel concatenation expressions.
> So, should we allow "parallel" on array aggregates when they involve
> an iterated_component_association?
That was in a separate thread, and I didn't see it until after I reread the
first one. Anyway, it doesn't have anything to do with 4.5.10 as it is solely
about aggregates, so it seems OK to treat that separately. (Especially as
someone needs to explain what needs to be added to the container aggregates
and to the Ada.Containers packages in order to enable real parallel operation.
That's not clear to me, especially for the associative containers.)
> > A second question: are we going to keep the possibility for a
> > *sequential* reduce to have operands of different types? I don't see
> > much reason to remove it (it probably simplifies the wording
> > slightly [mainly getting rid of a Legality Rule for parallel
> > reduces], but it doesn't seem to have much impact on the
> > implementation which is the same either way). Probably the only
> > thing it complicates is resolution, and that is a mess with or without the
> > extra capability.
>
> I would think that it is probably good to leave support for the
> *sequential* case, unless someone comes up with good reason to
> disallow that.
That's my position, too. It doesn't change the mental model (unlike the
separate combiner function), it doesn't change or complicate the
implementation, and it might be useful in some contexts. We've already done
the hard work of wording it, so leave it.
****************************************************************
From: Arnaud Charlet
Sent: Wednesday, November 27, 2019 2:38 AM
> Since this is "just" deleting some text, I'm happy to take a swing at
> writing this up formally. And it would be valuable to see how much
> simpler the wording would be without it. However, it's going to take
> several hours to do (this is a large section, and other
> simplifications to the wording are possible once the combiner stuff is
> gone). So I wanted to verify that we are proposing to remove all
> vestiges of the combiner from the wording before spending the time.
Yes, let's remove it.
****************************************************************
From: Tucker Taft
Sent: Wednesday, November 27, 2019 7:37 AM
>> ...
>> I would think that it is probably good to leave support for the
>> *sequential* case, unless someone comes up with good reason to
>> disallow that.
>
> That's my position, too. It doesn't change the mental model (unlike
> the separate combiner function), it doesn't change or complicate the
> implementation, and it might be useful in some contexts. We've already
> done the hard work of wording it, so leave it.
That seems fine. So we get rid of the "combiner" parameter and all mention
of it, and in the presence of "parallel" we require the types of the
parameters of the "reducer" be the same.
****************************************************************
From: Randy Brukardt
Sent: Tuesday, January 28, 2020 8:41 PM
In updating the AARM notes for this AI, I discovered that the revised Bounded
Error in the AI has no relationship to any remaining semantics. Thus it could
be dropped.
However, that made it clear that there was no mention of the fact that a
nonassociative reducer in a parallel reduction generates unspecified results.
(I had thought that the existing Bounded Error covered that case, but it
obviously did not.)
After discussion with Steve and Tucker, we agreed that we still need a Bounded
Error for a nonassociative reducer. We then crafted the following text:
For a parallel reduction expression, it is a bounded error if the reducer
subprogram is not associative. That is, for any arbitrary values of subtype
Value_Type A, B, C and a reducer function R, the result of R (A, R (B, C))
should produce a result equal to R (R (A, B), C)). The possible consequences
are Program_Error, or a result that does not match the equivalent sequential
reduction expression due to the order of calls on the reducer subprogram being
unspecified in the overall reduction. Analogous rules apply in the case of a
reduction procedure.
AARM Reason: In a sequential reduction expression, the reducer subprogram is
called in a lefttoright order, whereas in a parallel reduction expression,
the reducer subprogram is called in an order that depends on the number of
logical threads of control that execute the reduction. If the reducer is
associative, this order does not matter, but in other cases, very different
results are possible. While one can specify the maximum number of chunks, the
actual number of chunks is unspecified. Thus, to get a consistent and portable
result, an associative reducer is required. We define this as a Bounded
(RunTime) Errors to provide a stern warning about the required nature of the
reducer subprogram and to let compilers detect the problem when possible.
AARM To be honest: In this rule, “equal” means semantically equal. We don't
care if the bit patterns differ but that the results mean the same thing. In
particular, if the primitive equal is userdefined, that equality would be the
one used to determine if this rule is violated.
[Randy's note: This last AARM note is carried over from the previous Bounded
Error, it still seems relevant.]
I also added the following to the !discussion of the AI:
Once all of the rules specific to combiners are removed, it became apparent
that there was no rule requiring an associative reducer subprogram for
parallel reduction. That is necessary in order to get a consistent and
portable result from a reduction, as the actual number of chunks used and the
split of elements/components are both unspecified. (The maximum number of
chunks can be specified, but the implementation doesn't have to use that
number. There are restrictions on allowed splits, but not enough to determine
the number of elements in each chunk.)
We do that with a Bounded Error similar to the original error.
For example, the following reduction is welldefined:
Str : String := ...
type Hash_Type is mod <someprimenumber>;
function Hash (Accum, Value : in Hash_Type) return Hash_Type is
(2*Accum + Value);
... := [for I in Str'range => Character'Pos(Str(I))]'Reduce (Hash, 0);
The reduction here creates a hash value for Str. But the similar parallel
reduction triggers the Bounded Error:
... := [parallel for I in Str'range => Character'Pos(Str(I))]'Reduce (Hash, 0);
This makes the result (ahem) hash, since Hash is not associative:
Hash (A, Hash (B, C)) = 2*A + 2*B + C  Unintended result!
Hash (Hash (A, B), C) = 4*A + 2*B + C  Intended result.

We're considering this as part of the editorial review of the AI. Since this
is unrelated to the purpose of this AI (it's fixing a bug in the underlying
AIs), if anyone wants to discuss it properly, we would make it into a separate
AI for discussion.
It should be noted that we considered just using a user Note for this purpose.
The Dynamic Semantics defines the order of calls on Reduce (and that it is
different than the sequential case), so we could just note that the result
depends on two properties that are mostly unspecified if the Reducer is
nonassociative.
Tucker and Steve both felt this was a new case unlike existing order
dependencies and that it therefore needs greater visibility than just a
user note.
Again, if anyone wants a full discussion on this point, please speak up. (And
feel free to wack me with any typos I managed to add to the above.)
****************************************************************
From: Brad Moore
Sent: Tuesday, January 28, 2020 9:27 PM
> After discussion with Steve and Tucker, we agreed that we still need a
> Bounded Error for a nonassociative reducer. We then crafted the
> following
> text:
>
> For a parallel reduction expression, it is a bounded error if the
> reducer subprogram is not associative. That is, for any arbitrary
> values of subtype Value_Type A, B, C and a reducer function R, the
> result of R (A, R (B, C)) should produce a result equal to R (R (A,
> B), C)). The possible consequences are Program_Error, or a result that
> does not match the equivalent sequential reduction expression due to
> the order of calls on the reducer subprogram being unspecified in the
> overall reduction. Analogous rules apply in the case of a reduction procedure.
...
> We do that with a Bounded Error similar to the original error.
>
> For example, the following reduction is welldefined:
>
> Str : String := ...
> type Hash_Type is mod <someprimenumber>;
>
> function Hash (Accum, Value : in Hash_Type) return Hash_Type is
> (2*Accum + Value);
>
> ... := [for I in Str'range => Character'Pos(Str(I))]'Reduce (Hash, 0);
>
> The reduction here creates a hash value for Str. But the similar
> parallel reduction triggers the Bounded Error:
>
> ... := [parallel for I in Str'range =>
> Character'Pos(Str(I))]'Reduce (Hash, 0);
>
> This makes the result (ahem) hash, since Hash is not associative:
>
> Hash (A, Hash (B, C)) = 2*A + 2*B + C  Unintended result!
> Hash (Hash (A, B), C) = 4*A + 2*B + C  Intended result.
>
Another example that is more likely to be seen in practice:
Technically floating point operations such as addition are not associative
either, though still useful for parallel execution. Nonassociativity can
sometimes be in the eye of the beholder. If you are looking for a precise
bit pattern for a hash lookup, you might care more about the
nonassociativity, but if you only looking for an approximation, the
nonassociativity of the floating point can be overlooked.
Even the approximation has tolerance limits. If I use 32 bit floating point
addition of floating point values from 1.0 to N, I get associative matching
output for values of N up to about 5700.0.
After that errors start to appear, but still a reasonable approximation. For
values of N around 100_000_000 however, the parallel results are complete
garbage.
Using 64 bit floating point however, gives associative output for values of
N at least as high as 4_000_000_000.
(I haven't hit the limit of nonassociativity there yet, the processing time
starts to get too long to wait for the results.)
This illustrates the bounded error proposed in Randy's submission.
Whether the error is acceptable depends on the needs of the programmer, and
the bounds of the calculation.
I agree that a bounded error is the better approach than a user note in the
wording. I cant see a compiler being able to always decide where the line
should be drawn for what is an acceptable level of associativity for cases
such as this.
****************************************************************
From: Brad Moore
Sent: Tuesday, January 28, 2020 10:06 PM
> Even the approximation has tolerance limits. If I use 32 bit floating
> point addition of floating point values from 1.0 to N, I get
> associative matching output for values of N up to about 5700.0.
> After that errors start to appear, but still a reasonable
> approximation. For values of N around 100_000_000 however, the
> parallel results are complete garbage.
>
> Using 64 bit floating point however, gives associative output for
> values of N at least as high as 4_000_000_000.
> (I haven't hit the limit of nonassociativity there yet, the
> processing time starts to get too long to wait for the results.)
For the record,
Actually, for the 4_000_000_000 test, I was using 128 bit floating point
(GNAT's Long_Long_Float), not 64 bit (GNAT's Long_Float).
****************************************************************
From: Tucker Taft
Sent: Tuesday, January 28, 2020 11:11 PM
I am not sure floating point should be rolled into this. That is more of a
"normal" order dependence, and other standards like OpenMP see this as a
numerical analysis problem, rather than a real "bug". I would be reluctant
to allow a compiler to raise Program_Error in this case.
****************************************************************
Questions? Ask the ACAA Technical Agent