!standard 4.5.10(0) 20-01-28 AI12-0348-1/03 !class Amendment 20-01-08 !status Amendment 1-2012 20-01-15 !status ARG Approved 9-0-4 20-01-15 !status work item 20-01-08 !status received 19-10-13 !priority Medium !difficulty Easy !subject Remove Combiners from Reduction Expressions !summary Combiners are removed from reduction expressions. As a consequence, the Accum_Type and the Value_Type must be the same for parallel reductions. !problem Combiners seem necessary mostly for contrived uses of reduction expressions, and add substantial complexity to the model and resolution of reduction expressions. We should eliminate them. !proposal (See Summary.) !wording (See !corrigendum.) [Editor's note: Simplifying only the formatted version saves me from having to construct two parallel versions of this wording. Note that this wording is the entire subclause; many parts are unchanged, but enough change that it is easier to just repeat the entire thing.] !corrigendum 4.5.10(0) @dinsc Reduction expressions provide a way to map or transform a collection of values into a new set of values, and then summarize the values produced by applying an operation to reduce the set to a single value result. A reduction expression is represented as an @fa of the reduction attributes Reduce or Parallel_Reduce. @s8<@i> @xindent<@fa@fa<@ ::=@ >@hr @ @ @ @ @fa'@fa@hr @ @ |@ @fa'@fa> @xindent<@fa@fa<@ ::=@ >@hr @ @ @ @ '['@ [@b[(@fa)]]@ @fa@ ']'> @xindent<@fa@fa<@ ::=@ >@i@fa(@fa)> @xindent<@fa@fa<@ ::=@ >@i@fa,@ @i@fa> The @fa of a @fa shall not have a @i@fa, nor shall it have a @fa that has the reserved word @b. The @fa, if any, of a @fa shall be an @i@fa. @s8<@i> The expected type for a @fa shall be a single nonlimited type. In the remainder of this subclause, we will refer to nonlimited subtypes @i and @i of a @fa. These subtypes and interpretations of the @fas and @fas of a @fa are determined by the following rules: @xbullet<@i is a subtype of the expected type of the @fa.> @xbullet is either subtype conformant with the following specification:> @xcode< @b Reducer(Accumulator : @i; Value : @i) @b @i;> @xindent @xcode< @b Reducer(Accumulator : @b @i; Value : @b @i);> @xbullet@fa of a @fa denotes a reducer subprogram.> @xbullet@fa of a @fa is that of subtype @i.> @xbullet of the @fa of a @fa is that of subtype @i.> @s8<@i> If the @fa has a @fa with the reserved word @b, the subtypes @i and @i shall statically match. If the @fa of a @fa is Parallel_Reduce, the subtypes @i and @i shall statically match. @s8<@i> A @fa denotes a value, with nominal subtype being the subtype of the first parameter of the subprogram denoted by the @i@fa. @s8<@i> For the evaluation of a @fa, the @fa is elaborated, then an iteration is performed, and for each value conditionally produced by the iteration (see 5.5 and 5.5.2), the associated @fa is evaluated with the loop parameter having this value, to produce a result that is converted to Value_Type, and used to define the next value in the sequence. If the @fa does not have the reserved word @b, it is produced as a single sequence of values by a single logical thread of control. If the reserved word @b is present in the @fa, the enclosing @fa is a parallel construct, and the sequence of values is generated by a parallel iteration (as defined in 5.5, 5.5.1, and 5.5.2), as a set of non-empty, non-overlapping contiguous chunks (@i) with one logical thread of control (see clause 9) associated with each subsequence. If there is a @fa, it determines the maximum number of chunks, as defined in 5.5; otherwise the maximum number of chunks is implementation defined. For a @fa V, the following attribute is defined: @xhang<@xterm This attribute represents a @i, and is in the form of a @fa.> @xindent (the @i@fa Reducer and the @i@fa Initial_Value), in an arbitrary order. It then initializes the @i of the reduction expression to the value of the @i@fa (the @i). The @fa V is then evaluated.> @xindent does not have the reserved word @b, each value of the @fa is passed, in order, as the second (Value) parameter to a call on Reducer, with the first (Accumulator) parameter being the prior value of the accumulator, saving the result as the new value of the accumulator. The reduction expression yields the final value of the accumulator.> @xindent is present in a @fa, then the (parallel) reduction expression is a parallel construct and the sequence has been partitioned into one or more subsequences (see above) each with its own separate logical thread of control.> @xindent @xindent @xindent yields an empty sequence of values, the reduction expression yields the initial value.> @xindent For a @fa X of an array type (after any implicit dereference), or that denotes an iterable container object (see 5.5.1), the following attributes are defined: @xhang<@xtermX'Reduce is a reduction expression that yields a result equivalent to replacing the @fa of the attribute with the @fa:> @xcode< [@b Item @b X =@> Item]> @xhang<@xtermX'Parallel_Reduce is a reduction expression that yields a result equivalent to replacing the attribute @fa with Reduce and the @fa of the attribute with the @fa:> @xcode< [@b Item @b X =@> Item]> @s8<@i> For a parallel reduction expression, it is a bounded error if the reducer subprogram is not associative. That is, for any arbitrary values of subtype @i @i, @i, @i and a reducer function @i, the result of @i (@i, @i (@i, @i)) should produce a result equal to @i (@i (@i, @i), @i)). The possible consequences are Program_Error, or a result that does not match the equivalent sequential reduction expression due to the order of calls on the reducer subprogram being unspecified in the overall reduction. Analogous rules apply in the case of a reduction procedure. @s8<@i> An expression function that returns its result as a Reduction Expression: @xcode<@b Factorial(N : Natural) @b Natural @b ([@b J @b 1..N =@> J]'Reduce("*", 1));> An expression function that computes the Sine of X using a Taylor expansion: @xcode<@b Sine (X : Float; Num_Terms : Positive := 5) @b Float @b ([@b I @b 1..Num_Terms =@> (-1.0)**(I-1) * X**(2*I-1)/Float(Factorial(2*I-1))] 'Reduce("+", 0.0));> A reduction expression that outputs the sum of squares: @xcode I @b 1 .. 10 =@> I**2]'Reduce("+", 0));> An expression function to compute the value of Pi: @xcode<-- @ft<@i> @b Pi (Number_Of_Steps : Natural := 10_000) @b Real @b (1.0 / Number_Of_Steps * [@b I @b 1 .. Number_Of_Steps =@> (4.0 / (1.0 + ((Real (I) - 0.5) * (1.0 / Number_Of_Steps))**2))] 'Reduce("+", 0.0));> Calculate the sum of elements of an array of integers: @xcode>> Determine if all elements in a two-dimensional array of booleans are set to true: @xcode>> Calculate the minimum value of an array of integers in parallel: @xcode A parallel reduction expression used to calculate the mean of the elements of a two-dimensional array of subtype Matrix (see 3.6) that are greater than 100.0: @xcode<@b Accumulator @b Sum : Real; --@ft<@i< See 3.5.7.>> Count : Integer; @b;> @xcode<@b Accumulate (L, R : Accumulator) @b Accumulator @b (Sum =@> L.Sum + R.Sum, Count =@> L.Count + R.Count);> @xcode<@b Average_of_Values_Greater_Than_100 (M : Matrix) @b Real @b (@b Acc : @b Accumulator := [@b Val @b M @b Val @> 100.0 =@> (Val, 1)] 'Reduce(Accumulate, (Sum =@> 0, Count =@> 0)); @b Acc.Sum / Real(Acc.Count));> !discussion Examples of the use of combiners seem mainly to be examples that don't benefit from parallel execution (but combiners are only necessary for parallel execution) or are situations that would be better written as an aggregate with an iterator. We propose that aggregates can also specify parallel execution (see AI12-0349-1), so combiners are almost completely redundant and we can eliminate them without losing significant functionality. --- Once all of the rules specific to combiners are removed, it became apparent that there was no rule requiring an associative reducer subprogram for parallel reduction. That is necessary in order to get a consistent and portable result from a reduction, as the actual number of chunks used and the split of elements/components are both unspecified. (The maximum number of chunks can be specified, but the implementation doesn't have to use that number. There are restrictions on allowed splits, but not enough to determine the number of elements in each chunk.) We do that with a Bounded Error similar to the original error. For example, the following reduction is well-defined: Str : String := ... type Hash_Type is mod ; function Hash (Accum, Value : in Hash_Type) return Hash_Type is (2*Accum + Value); ... := [for I in Str'range => Character'Pos(Str(I))]'Reduce (Hash, 0); The reduction here creates a hash value for Str. But the similar parallel reduction triggers the Bounded Error: ... := [parallel for I in Str'range => Character'Pos(Str(I))]'Reduce (Hash, 0); This makes the result (ahem) hash, since Hash is not associative: Hash (A, Hash (B, C)) = 2*A + 2*B + C -- Unintended result! Hash (Hash (A, B), C) = 4*A + 2*B + C -- Intended result. !ASIS Whatever is needed is covered by the original AIs; if anything, a combiner parameter can be dropped from something. !ACATS test We need a B-Test to check the new rule that the Accum_Type and Value_Type are the same for parallel iterators. Otherwise, tests for the original AIs should cover everything (and tests of combiners can be dropped). !appendix From: Brad Moore Sent: Sunday, October 13, 2019 11:41 AM At the Lexington meeting earlier this month, we decided to carve out the example of a Reduction expression involving a combiner function out of AI12-0312-1, since we had not yet come up with a good example. We agreed that once we had such an example, that could be handled in a separate AI. While I haven't yet come up with an example, I have thought of another that would be good to add. In C++, there are examples of using variadic template syntax to create functions to add an arbitrary list of numbers. I was thinking about how best to do something like this in Ada. I believe an array aggregate in combination with a reduction expression would work. Something like; Append after 4.2.1 (9.d/5) -- A reduction expression that computes the sum of an array aggregate Total_Grocery_Bill : constant Money := (19.99, 1.49, 5.36, 75.66)'Reduce("+", 0); -- See 3.5.9 We don't have an example showing how an array aggregate can be used in combination with a reduction expression. This uses round braces of an array aggregate rather than the square braces of a value sequence, which also would be a good thing to show. **************************************************************** From: Tucker Taft Sent: Sunday, October 13, 2019 12:03 PM > We don't have an example showing how an array aggregate can be used in > combination with a reduction expression. I don't think this is legal as written. If it is not a value sequence, then I think the aggregate needs to be part of a qualified expression to be legal as a "prefix" of an attribute. See latest Ada 202X RM, 4.5.10(2/5) and 4.1(2/5, 4). > This uses round braces of an array aggregate rather than the square > braces of a value sequence, which also would be a good thing to show. I am not sure, but I thought we agreed to use '[' ... ']' for most array aggregate examples. And I think here in particular it might be just confusing to use parentheses, since I think we are trying to make these two uses of 'Reduce look as similar as possible, so the user doesn't have to worry about the subtle differences between them, and instead can focus on the commonality. **************************************************************** From: Brad Moore Sent: Sunday, October 13, 2019 12:13 PM Also, while on the topic of reduction examples, I was experimenting in getting the To_Roman_Number function example of AI12-0312, which converts a string of Roman characters into a decimal value, to be proven in SPARK. I did manage to get something that SPARK proves. Though my implementation of the function is different than the one shown in AI12-0312. In particular, I was interested in creating a precondition that only allows what I consider as valid sequences of Roman Digits. (e.g. No more than 3 characters of any kind in a row) It turns out that different approaches to create such a validation can end up being quite complex. I could have written an Is_Valid function and hidden complex rules in there, but I was hoping to create an expression that shows the rules in the contract rather than hidden in the implementation. I did manage to do this. I believe we could replace the precondition we currently have in the example with: type Roman_Thousands is (M, MM, MMM) with Ghost; type Roman_Hundreds is (C, CC, CCC, CD, D, DC, DCC, DCCC, CM) with Ghost; type Roman_Tens is (X, XX, XXX, XL, L, LX, LXX, LXXX, XC) with Ghost; type Roman_Ones is (I, II, III, IV, V, VI, VII, VIII, IX) with Ghost; function To_Roman_Number (Roman : Roman_String) return Roman_Number with Pre => (for some I in Roman_Thousands'Range => (I'Image = Roman or else (for some J in Roman_Hundreds'Range => J'Image = Roman or else I'Image & J'Image = Roman or else (for some K in Roman_Tens'Range => K'Image = Roman or else I'Image & K'Image = Roman or else J'Image & K'Image = Roman or else I'Image & J'Image & K'Image = Roman or else (for some L in Roman_Ones'Range => L'Image = Roman or else I'Image & L'Image = Roman or else J'Image & L'Image = Roman or else K'Image & L'Image = Roman or else I'Image & J'Image & L'Image = Roman or else I'Image & K'Image & L'Image = Roman or else I'Image & J'Image & K'Image & L'Image = Roman or else J'Image & K'Image & L'Image = Roman))))); It requires that the input string must have at least one match with one of the images of the enumerations above, and also enforces all the rules one would expect for Roman numerals. (e.g. The thousands comes before hundreds, which comes before tens, etc) However, I am not suggesting we should necessarily update the example in the RM with this, unless others thought is was worthwhile. It may be considered to be too complex for a simple example. On the otherhand, it is a precise precondition contract for the function. I did find this to be an interesting exercise, none the less. **************************************************************** From: Brad Moore Sent: Sunday, October 13, 2019 12:42 PM Thinking more about this, I think it would be nice to show this in the RM, as an alternative to what we have now, instead of as a replacement, perhaps as a note? It demonstrates that contracts can have varying preciseness, and also, this could be an example of the ghost aspect, which we not have any examples for yet. **************************************************************** From: Brad Moore Sent: Sunday, October 13, 2019 12:55 PM > I am not sure, but I thought we agreed to use '[' ... ']' for most > array aggregate examples. And I think here in particular it might be > just confusing to use parentheses, since I think we are trying to make > these two uses of 'Reduce look as similar as possible, so the user > doesn't have to worry about the subtle differences between them, and instead > can focus on the commonality. Thanks Tucker for reminding me. This gives all the more reason for showing an example of how to do this in the RM. So instead, I would suggest; Append after 4.2.1 (9.d/5) -- A reduction expression that computes the sum of an array aggregate type Expenses is array (Positive range <>) of Money; -- See 3.5.9 Total_Grocery_Bill : constant Money := Expenses'[19.99, 1.49, 5.36, 75.66]'Reduce("+", 0); **************************************************************** From: Brad Moore Sent: Tuesday, October 15, 2019 9:24 PM I am definitely having second thoughts about this example, because why not simplify this further and just write an expression with all the "+"s, and not bother with 'Reduce? i.e. Nobody would write this. So I thought I'd better come up with a better example of an aggregate/reduction expression combination. It turns out that coming up with a good example using aggregate syntax with 'Reduce is almost as difficult as coming up with one using a combiner. A better example would be to use a reduction that is not an operator, such as 'Min or 'Max, and probably better to involve variables where the values are not known. I think the following would work a lot better as an example. type Expenses is array (Positive range <>) of Money; -- See 3.5.9 -- A reduction expression that computes the lowest estimated cost for a repair -- using an array aggregate function Cheapest_Estimate (First_Estimate, Second_Estimate, Third_Estimate : Money) is (Expenses'[First_Estimate, Second_Estimate, Third_Estimate]'Reduce(Money'Min, Money'Last)); For a function like this, if there were more parameters, then one would likely want to use an array parameter instead, but then this would not be an example involving aggregate syntax. **************************************************************** From: Jean-Pierre Rosen Sent: Wednesday, October 16, 2019 12:10 AM > I am definitely having second thoughts about this example, because why > not simplify this further and just write an expression with all the > "+"s, and not bother with 'Reduce? i.e. Nobody would write this. Agreed! ... >function Cheapest_Estimate (First_Estimate, Second_Estimate, > Third_Estimate : Money) is > (Expenses'[First_Estimate, Second_Estimate, Third_Estimate]'Reduce(Money'Min, Money'Last)); I'm afraid nobody would write this either. Honestly, if you can spell out all the values of an aggregate, it is simpler to write the reduction explicitely rather than using 'Reduce. Or you need really huge aggregates (as happens sometimes with state machines), but that would be too much for RM example... **************************************************************** From: Brad Moore Sent: Wednesday, October 16, 2019 10:37 AM Or rather nobody would write this once they realized there was a better way. ;-) I originally was comparing the reduction expression to writing an if expression without the use of 'Max, and the reduction expression is quite a bit simpler than that. But if you write the expression function as two calls to 'Max, where the second call is nested in the first, I agree that beats using a reduction expression. So, I think we can conclude from this that the use of aggregate syntax in a reduction expression is not a useful combination, and that no such example should be included in the RM. **************************************************************** From: Edmond Schonberg Sent: Sunday, October 13, 2019 5:50 PM > At the Lexington meeting earlier this month, we decided to carve out > the example of a Reduction expression involving a combiner function out > of AI12-0312-1, since we had not yet come up with a good example. ... I have also been unable to find a convincing example of use of a combiner. I can;’t find any algorithms that would benefit from this additional complication, and would suggest that we leave it out of Reduce, All examples that use reduction parallelize if the operation has the proper associativity, and the combiner adds nothing except description and implementation .complexity. **************************************************************** From: Tucker Taft Sent: Sunday, October 13, 2019 8:07 PM I'd like to give it one more try. I am pretty sure other reduction primitives allow for two distinct operations. I'll see if I can find examples associated with those. The classic example is where you are concatenating characters into a string, or elements into a set, where there is a distinction between adding an element to a container, and combining two containers. There can be significant overhead in first turning an element into a singleton container, so it is nice to be able skip over that step. **************************************************************** From: Jeffrey Cousins Sent: Monday, October 14, 2019 2:57 AM > would suggest that we leave it out of Reduce, All examples that use reduction Ed, just to clarify, do you mean leave it out of Reduce, or just leave it out of the example of Reduce? **************************************************************** From: Edmond Schonberg Sent: Monday, October 14, 2019 6:42 AM I mean leave it out of Reduce. It seems curious to propose and new feature and then have to look long and hard to find a possible use for it. The concatenation example is an unusual idiom that is only remotely connected to reduction and is not enough to justify the added complexity. **************************************************************** From: Tucker Taft Sent: Monday, October 14, 2019 1:34 PM OK, I did a bit of research. OpenMP allows user-defined reductions with a separate "combiner" operation. One example that popped out is that when you are summing up a large array, it is quite common that the sum needs more range than the individual elements of the array. So the "combiner" might be operating on long integers, or double-precision floats, while the individual elements are shorter integers or single-precision floats. On the other hand, you generally need to convert each element before adding it into the ongoing total, so unless you have a special operation that combines a conversion and an add, there is no special requirement for a separate operation, though clearly a pure array-oriented "reduce" without any ability to convert the value first to a larger type would be a problem. However, for the Reduce operation we have proposed that operates on a value sequence rather than an array, we provide the opportunity to specify a conversion (or some more general "map" operation as part of the "map/reduce" paradigm), so a conversion by itself does not seem enough to justify the additional operation. I am going to look at examples for OpenMP "declare reduction" -- so far they deal with the case that the operation is not one of the operators that OpenMP has built-in knowledge of, such as what is the "identity" of the operator, etc. We already account for that situation. For what it is worth, here is Michael Klemm's presentation on OpenMP 4.0, where the "declare reduction" was introduced: https://www.hpc2n.umu.se/sites/default/files/2.04%20Outlook%20on%20OpenMP%204.pdf His example is on slide 11. More to come! **************************************************************** From: Tucker Taft Sent: Monday, October 14, 2019 6:55 PM After more research, I am coming around to Ed's view. I think given the "roll-your-own" reduction using a chunk parameter, the fact that the element/container examples are best handled with a container aggregate, and similarly the character/string example can be handled with an array aggregate based on an iterator, we have enough to make the "leap" to eliminate the option where the reducer and the combiner would be different. So in the simplified version we would only have a reducer, and it would be required to have the same type as both inputs and as output. We could keep the reducer/combiner option for a future standard if we get a huge groundswell for it... ;-) But it is worth remembering that the container aggregate, generalized array aggregate, and the chunk-parameter mechanism are what make this unnecessary, so those are providing important value. **************************************************************** From: Brad Moore Sent: Tuesday, October 15, 2019 9:03 PM I think it is important to also remember that the combiner is only needed for parallelism. Without it, the reducer still can work, but would need to be sequential if the parameters of the reducer are of different types. That is, I don't see that container aggregate and generalized array aggregate syntax replace use cases for combiners, in that the parallel keyword is not allowed currently in aggregate syntax, and so in theory would not facilitate parallelism in cases where a combiner would be needed. For parallelism, though, based on previous examples, the combiner parameter was pretty much only needed for reductions involving concatenation (eg. for arrays or vector containers), or similarly concatenation-like operations such populating a container by appending items (e.g. to a linked list, vector, or into a set container). For something like concatenating letters into a string, it is difficult to imagine an example where parallelism would be worthwhile. It could be, if the amount of processing needed to generate each letter was significant, but not likely to be a real example. Concatenating matrices into a vector is likely a more realistic example that could benefit from parallelism. But then it seems unlikely to me that one would be manufacturing matrices out of thin air, and using concatenation-like operations. Rather, it seems more likely that there would already be an array of data to be used for generating the vector of matrices. (eg. Perhaps multiplying vectors together to produce matrices?) That is, the number of elements of the resulting array of matrices is known prior to processing, and could be used to size the result before the processing begins. In that case, as Tucker suggests, I believe a chunked parallel loop could be used with a Vector container to place each computed matrix into the correct slot of the result. So I think I agree that while there may be some cases where a combiner could be useful, it does seem that the possible uses are few and far between in practice, and probably not worth the additional complexity at this time, unless as Tucker says, there is a huge groundswell and demand in the future to support cases that have so far been eluding us. In addition to the roll-your-own chunking loop syntax, I think the "map" part of the map/reduce syntax for 'Reduce also helps to eliminate the need for combiners, in cases where the map can convert a value into a matching type for a reducer without introducing too much overhead. **************************************************************** From: Randy Brukardt Sent: Tuesday, November 26, 2019 8:19 PM >>Ed, just to clarify, do you mean leave it out of Reduce, or just leave it >>out of the example of Reduce? >I mean leave it out of Reduce. It seems curious to propose and new feature >and then have to look long and hard to find a possible use for it. The >concatenation example is an unusual idiom that is only remotely connected >to reduction and is not enough to justify the added complexity. Tucker eventually replied: >After more research, I am coming around to Ed's view. Brad eventually concurred as well. Since this is "just" deleting some text, I'm happy to take a swing at writing this up formally. And it would be valuable to see how much simpler the wording would be without it. However, it's going to take several hours to do (this is a large section, and other simplifications to the wording are possible once the combiner stuff is gone). So I wanted to verify that we are proposing to remove all vestiges of the combiner from the wording before spending the time. A second question: are we going to keep the possibility for a *sequential* reduce to have operands of different types? I don't see much reason to remove it (it probably simplifies the wording slightly [mainly getting rid of a Legality Rule for parallel reduces], but it doesn't seem to have much impact on the implementation which is the same either way). Probably the only thing it complicates is resolution, and that is a mess with or without the extra capability. Thoughts? **************************************************************** From: Brad Moore Sent: Tuesday, November 26, 2019 11:02 PM There is one issue that likely involves adding some text, however. The thought was that array aggregates were a better way to write concatenation, and would be a better way to express parallelism for such a problem. However, currently array aggregates do not allow the parallel keyword. If we could allow that, then I think it is a no-brainer for removing the combiner from reduce. However, without that, we would then be losing the ability to do explicit parallel concatenation expressions. So, should we allow "parallel" on array aggregates when they involve an iterated_component_association? > A second question: are we going to keep the possibility for a > *sequential* reduce to have operands of different types? I don't see > much reason to remove it (it probably simplifies the wording slightly > [mainly getting rid of a Legality Rule for parallel reduces], but it > doesn't seem to have much impact on the implementation which is the > same either way). Probably the only thing it complicates is > resolution, and that is a mess with or without the extra capability. > > Thoughts? I would think that it is probably good to leave support for the *sequential* case, unless someone comes up with good reason to disallow that. **************************************************************** From: Randy Brukardt Sent: Tuesday, November 26, 2019 11:15 PM > However, currently array aggregates do not allow the parallel keyword. > If we could allow that, then I think it is a no-brainer for removing > the combiner from reduce. However, without that, we would then be > losing the ability to do explicit parallel concatenation expressions. > So, should we allow "parallel" on array aggregates when they involve > an iterated_component_association? That was in a separate thread, and I didn't see it until after I re-read the first one. Anyway, it doesn't have anything to do with 4.5.10 as it is solely about aggregates, so it seems OK to treat that separately. (Especially as someone needs to explain what needs to be added to the container aggregates and to the Ada.Containers packages in order to enable real parallel operation. That's not clear to me, especially for the associative containers.) > > A second question: are we going to keep the possibility for a > > *sequential* reduce to have operands of different types? I don't see > > much reason to remove it (it probably simplifies the wording > > slightly [mainly getting rid of a Legality Rule for parallel > > reduces], but it doesn't seem to have much impact on the > > implementation which is the same either way). Probably the only > > thing it complicates is resolution, and that is a mess with or without the > > extra capability. > > I would think that it is probably good to leave support for the > *sequential* case, unless someone comes up with good reason to > disallow that. That's my position, too. It doesn't change the mental model (unlike the separate combiner function), it doesn't change or complicate the implementation, and it might be useful in some contexts. We've already done the hard work of wording it, so leave it. **************************************************************** From: Arnaud Charlet Sent: Wednesday, November 27, 2019 2:38 AM > Since this is "just" deleting some text, I'm happy to take a swing at > writing this up formally. And it would be valuable to see how much > simpler the wording would be without it. However, it's going to take > several hours to do (this is a large section, and other > simplifications to the wording are possible once the combiner stuff is > gone). So I wanted to verify that we are proposing to remove all > vestiges of the combiner from the wording before spending the time. Yes, let's remove it. **************************************************************** From: Tucker Taft Sent: Wednesday, November 27, 2019 7:37 AM >> ... >> I would think that it is probably good to leave support for the >> *sequential* case, unless someone comes up with good reason to >> disallow that. > > That's my position, too. It doesn't change the mental model (unlike > the separate combiner function), it doesn't change or complicate the > implementation, and it might be useful in some contexts. We've already > done the hard work of wording it, so leave it. That seems fine. So we get rid of the "combiner" parameter and all mention of it, and in the presence of "parallel" we require the types of the parameters of the "reducer" be the same. **************************************************************** From: Randy Brukardt Sent: Tuesday, January 28, 2020 8:41 PM In updating the AARM notes for this AI, I discovered that the revised Bounded Error in the AI has no relationship to any remaining semantics. Thus it could be dropped. However, that made it clear that there was no mention of the fact that a non-associative reducer in a parallel reduction generates unspecified results. (I had thought that the existing Bounded Error covered that case, but it obviously did not.) After discussion with Steve and Tucker, we agreed that we still need a Bounded Error for a non-associative reducer. We then crafted the following text: For a parallel reduction expression, it is a bounded error if the reducer subprogram is not associative. That is, for any arbitrary values of subtype Value_Type A, B, C and a reducer function R, the result of R (A, R (B, C)) should produce a result equal to R (R (A, B), C)). The possible consequences are Program_Error, or a result that does not match the equivalent sequential reduction expression due to the order of calls on the reducer subprogram being unspecified in the overall reduction. Analogous rules apply in the case of a reduction procedure. AARM Reason: In a sequential reduction expression, the reducer subprogram is called in a left-to-right order, whereas in a parallel reduction expression, the reducer subprogram is called in an order that depends on the number of logical threads of control that execute the reduction. If the reducer is associative, this order does not matter, but in other cases, very different results are possible. While one can specify the maximum number of chunks, the actual number of chunks is unspecified. Thus, to get a consistent and portable result, an associative reducer is required. We define this as a Bounded (Run-Time) Errors to provide a stern warning about the required nature of the reducer subprogram and to let compilers detect the problem when possible. AARM To be honest: In this rule, “equal” means semantically equal. We don't care if the bit patterns differ but that the results mean the same thing. In particular, if the primitive equal is user-defined, that equality would be the one used to determine if this rule is violated. [Randy's note: This last AARM note is carried over from the previous Bounded Error, it still seems relevant.] I also added the following to the !discussion of the AI: Once all of the rules specific to combiners are removed, it became apparent that there was no rule requiring an associative reducer subprogram for parallel reduction. That is necessary in order to get a consistent and portable result from a reduction, as the actual number of chunks used and the split of elements/components are both unspecified. (The maximum number of chunks can be specified, but the implementation doesn't have to use that number. There are restrictions on allowed splits, but not enough to determine the number of elements in each chunk.) We do that with a Bounded Error similar to the original error. For example, the following reduction is well-defined: Str : String := ... type Hash_Type is mod ; function Hash (Accum, Value : in Hash_Type) return Hash_Type is (2*Accum + Value); ... := [for I in Str'range => Character'Pos(Str(I))]'Reduce (Hash, 0); The reduction here creates a hash value for Str. But the similar parallel reduction triggers the Bounded Error: ... := [parallel for I in Str'range => Character'Pos(Str(I))]'Reduce (Hash, 0); This makes the result (ahem) hash, since Hash is not associative: Hash (A, Hash (B, C)) = 2*A + 2*B + C -- Unintended result! Hash (Hash (A, B), C) = 4*A + 2*B + C -- Intended result. ------------------ We're considering this as part of the editorial review of the AI. Since this is unrelated to the purpose of this AI (it's fixing a bug in the underlying AIs), if anyone wants to discuss it properly, we would make it into a separate AI for discussion. It should be noted that we considered just using a user Note for this purpose. The Dynamic Semantics defines the order of calls on Reduce (and that it is different than the sequential case), so we could just note that the result depends on two properties that are mostly unspecified if the Reducer is non-associative. Tucker and Steve both felt this was a new case unlike existing order dependencies and that it therefore needs greater visibility than just a user note. Again, if anyone wants a full discussion on this point, please speak up. (And feel free to wack me with any typos I managed to add to the above.) **************************************************************** From: Brad Moore Sent: Tuesday, January 28, 2020 9:27 PM > After discussion with Steve and Tucker, we agreed that we still need a > Bounded Error for a non-associative reducer. We then crafted the > following > text: > > For a parallel reduction expression, it is a bounded error if the > reducer subprogram is not associative. That is, for any arbitrary > values of subtype Value_Type A, B, C and a reducer function R, the > result of R (A, R (B, C)) should produce a result equal to R (R (A, > B), C)). The possible consequences are Program_Error, or a result that > does not match the equivalent sequential reduction expression due to > the order of calls on the reducer subprogram being unspecified in the > overall reduction. Analogous rules apply in the case of a reduction procedure. ... > We do that with a Bounded Error similar to the original error. > > For example, the following reduction is well-defined: > > Str : String := ... > type Hash_Type is mod ; > > function Hash (Accum, Value : in Hash_Type) return Hash_Type is > (2*Accum + Value); > > ... := [for I in Str'range => Character'Pos(Str(I))]'Reduce (Hash, 0); > > The reduction here creates a hash value for Str. But the similar > parallel reduction triggers the Bounded Error: > > ... := [parallel for I in Str'range => > Character'Pos(Str(I))]'Reduce (Hash, 0); > > This makes the result (ahem) hash, since Hash is not associative: > > Hash (A, Hash (B, C)) = 2*A + 2*B + C -- Unintended result! > Hash (Hash (A, B), C) = 4*A + 2*B + C -- Intended result. > Another example that is more likely to be seen in practice: Technically floating point operations such as addition are not associative either, though still useful for parallel execution. Non-associativity can sometimes be in the eye of the beholder. If you are looking for a precise bit pattern for a hash lookup, you might care more about the non-associativity, but if you only looking for an approximation, the non-associativity of the floating point can be overlooked. Even the approximation has tolerance limits. If I use 32 bit floating point addition of floating point values from 1.0 to N, I get associative matching output for values of N up to about 5700.0. After that errors start to appear, but still a reasonable approximation. For values of N around 100_000_000 however, the parallel results are complete garbage. Using 64 bit floating point however, gives associative output for values of N at least as high as 4_000_000_000. (I haven't hit the limit of non-associativity there yet, the processing time starts to get too long to wait for the results.) This illustrates the bounded error proposed in Randy's submission. Whether the error is acceptable depends on the needs of the programmer, and the bounds of the calculation. I agree that a bounded error is the better approach than a user note in the wording. I cant see a compiler being able to always decide where the line should be drawn for what is an acceptable level of associativity for cases such as this. **************************************************************** From: Brad Moore Sent: Tuesday, January 28, 2020 10:06 PM > Even the approximation has tolerance limits. If I use 32 bit floating > point addition of floating point values from 1.0 to N, I get > associative matching output for values of N up to about 5700.0. > After that errors start to appear, but still a reasonable > approximation. For values of N around 100_000_000 however, the > parallel results are complete garbage. > > Using 64 bit floating point however, gives associative output for > values of N at least as high as 4_000_000_000. > (I haven't hit the limit of non-associativity there yet, the > processing time starts to get too long to wait for the results.) For the record, Actually, for the 4_000_000_000 test, I was using 128 bit floating point (GNAT's Long_Long_Float), not 64 bit (GNAT's Long_Float). **************************************************************** From: Tucker Taft Sent: Tuesday, January 28, 2020 11:11 PM I am not sure floating point should be rolled into this. That is more of a "normal" order dependence, and other standards like OpenMP see this as a numerical analysis problem, rather than a real "bug". I would be reluctant to allow a compiler to raise Program_Error in this case. **************************************************************** From: Brad Moore Sent: Friday, January 31, 2020 12:31 AM I see your point, but I think a program error would be welcomed if the answer is wrong enough where enough error has accumulated. For example, if the most significant digit or the exponent value differs from what you'd get using higher precision calculation. While such errors can be difficult to detect, I don't think it is impossible to detect. And the problem is not necessarily related to parallelism. You can get answers just as wrong using sequential calculations. I can imagine there being an error tolerance threshold (possibly user configurable) for the compiler and a floating point verification mode where the same calculations are done (possibly in parallel) using two levels of precision (the user specified precision, and some higher level of precision). If the most significant digit differs between the two results, a program error could be raised. Perhaps another approach might be to keep track of the total digits of precision thrown away, such as adding many small floating point values to a single large value. Other (normal) compilation modes or other compilers wouldn't have to detect the error. For example, if I sequentially calculate the sum of integer values cast to 32 bit floating point from 1.0 to 100_000_000 I get a result of; 2_251_799_813_685_248.000 If I didn't know there was an issue, I might assume that is the correct answer, with possible disastrous consequence depending on what the number was being used for. But if I use 128 bit floating point, the result is; 5_000_000_050_000_000.000 Which is the correct answer and more than double the first result. It'd be nice while executing my 32 bit float program in a floating point verification mode, to be told that there is a program error due to insufficient precision and too much error accumulation if the answers I am getting are wildly incorrect. Changing my code to use say, a 64 bit float type instead might be sufficient to address the program error. OpenMP might call it a numerical analysis problem, and that is OK for use in languages such as C. But in Ada, with a higher focus on safety critical and high integrity systems, it seems that treating this as a bounded error might possibly be considered as another notch of safety that Ada has over other languages? **************************************************************** From: Tucker Taft Sent: Friday, January 31, 2020 1:07 AM > I see your point, but I think a program error would be welcomed if the > answer is wrong enough where enough error has accumulated. ... I would be surprised that an occasional data-dependent Program_Error would be "welcomed." Some kind of compile-time warning might be reasonable, but a run-time exception that is raised now and then sounds like a bit of a nightmare. **************************************************************** From: Randy Brukardt Sent: Friday, January 31, 2020 1:57 PM > I see your point, but I think a program error would be welcomed if the > answer is wrong enough where enough error has accumulated. Sure, but this has exactly nothing to do with reductions (sequential or parallel). It's a consequence of the model of floating point math, and it surely doesn't make any sense to single out this particular case. The interesting point brought up by your messages is that it's possible that an over-literal compiler implementer might interpret the "equality" as having something to do with the (relatively useless) "=" operator for floating point types. ... > While such errors can be difficult to detect, I don't think it is > impossible to detect. > And the problem is not necessarily related to parallelism. > You can get answers just as wrong using sequential calculations. Exactly. If you wanted to detect these errors, you could use interval math. If the final interval is too wide, the answer is nonsense. All you would need is an interval math package to use instead of the basic floating point. I know the 8087 (which is the instruction set available on all modern x86 processors) has modes to make implementing that easier (round down and round up modes), and I would guess those came from IEEE (pretty much everything else did). Since Ada has operator overloading, it would be pretty easy to drop in such a thing (of course, attributes would have to be replaced by function calls, how bad that is depends on the number of function calls). It doesn't seem necessary to standardize such a thing, certainly not if there isn't any demand from actual Ada users. **************************************************************** From: Jean-Pierre Rosen Sent: Friday, January 31, 2020 2:47 AM > Exactly. If you wanted to detect these errors, you could use interval math. CADNA (http://cadna.lip6.fr/) is the tool you need. It was initially developped for Ada with operator overloading, but unfortunately it is now available only for C/C++/Fortran... **************************************************************** From: Tullio Vardanega Sent: Friday, January 31, 2020 3:41 AM ... > I would be surprised that an occasional data-dependent Program_Error would > be "welcomed." Some kind of compile-time warning might be reasonable, but > a run-time exception that is raised now and then sounds like a bit of a > nightmare. I am firmly with Tuck on this view, viz. compile-time warning instead of program error. **************************************************************** From: Arnaud Charlet Sent: Friday, January 31, 2020 3:45 AM Agreed. And as Randy said, the issue is much broader and concerns the semantics and dangers of floating point, and would better be handled by a general purpose library (with e.g. operator overloading) rather than via a special case. ****************************************************************