CVS difference for ai12s/ai12-0119-1.txt

Differences between 1.8 and version 1.9
Log of other versions for file ai12s/ai12-0119-1.txt

--- ai12s/ai12-0119-1.txt	2017/09/07 02:46:48	1.8
+++ ai12s/ai12-0119-1.txt	2017/10/12 01:59:50	1.9
@@ -1,4 +1,4 @@
-!standard 5.5.2 (2/3)                              17-06-10    AI12-0119-1/03
+!standard 5.5.2 (2/3)                              17-10-11    AI12-0119-1/04
 !class Amendment 14-06-20
 !status work item 14-06-20
 !status received 14-06-17
@@ -8,16 +8,16 @@
 !summary
 
 New syntax and semantics to facilitate parallelism via Parallel Loops,
-Parallel Blocks, and Parallel Reduction Expressions.
+Concurrent Blocks, and Parallel Reduction Expressions.
 
 !problem
 
 The increased presence of parallel computing platforms brings concerns to the
-general purpose domain that were previously prevalent only in the specific
-niche of high-performance computing. As parallel programming technologies
-become more prevalent in the form of new emerging programming languages and
-extensions of existing languages, safety concerns need to
-consider the paradigm shift from sequential to parallel behavior.
+general purpose domain that were previously prevalent only in the specific niche
+of high-performance computing. As parallel programming technologies become more
+prevalent in the form of new emerging programming languages and extensions of
+existing languages, safety concerns need to consider the paradigm shift from
+sequential to parallel behavior.
 
 Ada needs mechanisms whereby the compiler is given the necessary semantic
 information to enable the implicit and explicit parallelization of code.
@@ -27,17 +27,16 @@
 
 !proposal
 
-This proposal depends on the facilities for aspect Global (AI12-0079-1)
-and for aspect Potentially_Blocking (AI12-0064-1).
-Those proposals allow the compiler to statically determine where
-parallelism may be introduced without introducing data races.
+This proposal depends on the facilities for aspect Global (AI12-0079-1) and for
+aspect Nonblocking (AI12-0064-2). Those proposals allow the compiler to
+statically determine where parallelism may be introduced without introducing
+data races.
 
 This proposal informally introduces the semantic notion of a Parallel
-OPportunity (POP) and a tasklet to the language. POPs are places in
-program text where work can be spawned to parallel executing workers
-that work in concert to correctly execute the algorithm. Tasklets are
-the notational (logical) units within a task that are executed in
-parallel with each other.
+OPportunity (POP) and a tasklet to the language. POPs are places in program text
+where work can be distributed to parallel executing workers that work in concert
+to correctly execute the algorithm. Tasklets are the notational (logical) units
+within a task that are executed in parallel with each other.
 
 The goals of this proposal are:
 - To permit existing programs to benefit from parallelism through
@@ -60,7 +59,7 @@
 
 Note that in this model the compiler will identify any code where a
 potential data race occurs (following the rules for concurrent access
-to objects as specified in the Language Reference Manual RM 9.10(23),
+to objects as specified in the Language Reference Manual RM 9.10(23)),
 and point out where objects cannot be guaranteed to be independently
 addressable. If not determinable at compile-time, the compiler may
 insert run-time checks to detect data overlap.
@@ -73,33 +72,41 @@
 which the tasklet is spawned. On the other hand, calls by different
 tasklets of the same task into the same protected object are treated as
 different calls resulting in distinct protected actions; therefore
-synchronization between tasklets can be performed using non-blocking
-protected operations. Note that this is consistent with the current
-standard which already supports multiple concurrent calls by a single
-task in the presence of the asynchronous transfer of control capability
+synchronization between tasklets can be performed using protected
+operations. Note that this is consistent with the current standard which
+already supports multiple concurrent calls by a single task in the
+presence of the asynchronous transfer of control capability
 (RM 9.7.4 (23)).
 
 There are three areas of this proposal; one that introduce capabilities
-for parallel blocks, another that introduces capabilities for
+for parallel/concurrent blocks, another that introduces capabilities for
 parallel loops, and another that introduces capabilities for parallel
 reduction.
 
-Parallel Blocks
+Concurrent Blocks
 ---------------
 
-Parallel blocks may be used to specify that two or more parts of an
-algorithm may be executed in parallel with each other.
-
-Semantics: A parallel block statement encloses two or more sequences of
-statements (two or more "parallel sequences") separated by the reserved
-word "and".  Each parallel sequence represents a separate tasklet, but
+Concurrent blocks may be used to specify that two or more parts of an
+algorithm may be executed concurrently, and possibly in parallel with
+each other.
+
+Semantics: A concurrent block statement encloses two or more sequences of
+statements (two or more "concurrent sequences") separated by the reserved
+word "and".  Each concurrent sequence represents a separate tasklet, but
 all within a single Ada task. Task identity remains that of the
 enclosing Ada task, and a single set of task attributes is shared
-between the tasklets.
+between the tasklets. Each sequence of statements is assigned a unique
+and independent executor to execute the tasklet. If the parallel keyword
+is present, then each executor can execute on any core or migrate
+independently to other cores and the tasklets can execute in parallel
+with each other. If the parallel keyword is absent, then each executor
+associated with the concurrent block is assigned to the core of the
+enclosing Ada task, unless the compiler can determine implicitly
+that the tasklets can safely be executed on different cores in parallel.
 
 With respect to the rules for shared variables (see RM 9.10(13)), two
-actions occurring within two different parallel sequences of the same
-parallel block are not automatically sequential, so execution can be
+actions occurring within two different concurrent sequences of the same
+concurrent block are not automatically sequential, so execution can be
 erroneous if one such action assigns to an object, and the other reads
 or updates the same object or a neighboring object that is not
 independently addressable from the first object.  The appropriate use of
@@ -110,23 +117,23 @@
 specified to enable the static detection of such problems at compile
 time (see AI12-0079-1 and AI12-0064-1).
 
-Any transfer of control out of one parallel sequence will initiate the
-aborting of the other parallel sequences not yet completed.  Once all
-other parallel sequences complete normally or abort, the transfer of
-control takes place.  If multiple parallel sequences attempt a transfer
+Any transfer of control out of one concurrent sequence will initiate the
+aborting of the other concurrent sequences not yet completed.  Once all
+other concurrent sequences complete normally or abort, the transfer of
+control takes place.  If multiple concurrent sequences attempt a transfer
 of control before completing, one is chosen arbitrarily and the others
 are aborted.
 
-If an exception is raised by any of the parallel sequences, it is
+If an exception is raised by any of the concurrent sequences, it is
 treated similarly to a transfer of control, with the exception being
 propagated only after all the other sequences complete normally or due
-to abortion.  If multiple parallel sequences raise an exception before
-completing, one is chosen arbitrarily and the others are aborted. The
-parallel block completes when all of the parallel sequences complete,
-either normally or by being aborted.
+to abortion.  If multiple concurrent sequences executing in parallel
+raise an exception before completing, one is chosen arbitrarily and the
+others are aborted. The concurrent block completes when all of the
+concurrent sequences complete, either normally or by being aborted.
 
 Note that aborting a tasklet need not be preemptive, but should prevent
-the initiation of further nested parallel blocks or parallel loops.
+the initiation of further nested concurrent blocks or parallel loops.
 
 Parallel Loops
 --------------
@@ -142,15 +149,15 @@
 these having to be specifed by the programmer.
 
 To indicate that a loop is a candidate for parallelization, the reserved
-word "parallel" may be inserted immediately after the word "in" or "of"
-in a "for" loop, at the point after where the "reverse" reserved word is
-allowed. Such a loop will be broken into chunks, where each chunk is
-processed sequentially.
+word "parallel" may be inserted immediately before the word "for" in a
+"for" loop. Such a loop is broken into chunks of iterations, where each
+chunk is processed sequentially, but potentially in parallel with the
+other chunks of iterations.
 
-Note that the same rules presented for parallel blocks above apply to
+Note that the same rules presented for concurrent blocks above apply to
 the update of shared variables and the transfer of control to a point
 outside of the loop, and for this purpose each iteration (or chunk) is
-treated as equivalent to a separate sequence of a parallel block.
+treated as equivalent to a separate sequence of a concurrent block.
 
 Reduction Expressions
 ---------------------
@@ -160,81 +167,121 @@
 This mechanism is called reduction.
 
 Indeed, a quantified expression can be viewed as being a special purpose
-Reduction Expression that applies an operation, the Predicate, to a set of values
-to reduce the predicate results to a single Boolean result.  A Reduction expression
-is a more general syntactic form where there is less constraint on the type of
-operation to be applied to the set of values, and the operation can produce result
-types other than Boolean values.
-
-Semantics: A Reduction Expression looks like a quantified expression, except
-that the quantifier keyword ("all" or "some") is not present, and the predicate
-expression is replaced with a function call that returns a constrained, nonlimited
-type, that is called the combiner_function_call. The Reduction expression must involve
-a function call having at least one parameter of the same type as the type of the
-Reduction expression. The combiner function is called iteratively to accumulating a
-single result, which becomes the result of the Reduction Expression. The accumulation
-of the result occurs as the result of each iteration is implicitly fed back as an input
-parameter to a function for each subsequent call. The implicit parameter is identified
-syntactically in the call based on the "<>" box notation syntax, except that the box
-notation encloses the initial value to be used for the first call to the function, since
-at that point there would otherwise be no existing value to pass into the function. The
-initial value also serves to be the result of the reduction expression for the case when
-there are no iterations performed (E.g. For the case when a when a
-loop_parameter_specification for a Reduction Expression specifies a null range).
-
-To indicate that a Reduction Expression is a candidate for parallelization, the
-reserved word "parallel" may be inserted immediately after where the "reverse"
-reserved word is allowed. Similar to the way Parallel Loops are broken into chunks,
-a Parallel Reduction expression will also have chunking applied, where each chunk
-is processed sequentially. For each chunk, an accumulator result is generated that is
-local to the tasklet, and for the first call to the subprogram for each chunk the
-initial value is used to initialize the local accumulated result. As the chunk results
-are calculated in parallel, the result of the Reduction Expression is generated by
-combining/reducing the final function results of each chunk by applying a Reducer
-function for the Reduction Expression. The Reducer function for a parallel Reduction
-expression is a function that accepts two parameters where both parameters are of the
-same type, and returns a result of that same type, which is also the type of the
-Reduction expression. The Combiner function must have a Reducer aspect specified for
-its declaration if the function itself is not a Reducer function. The Reducer aspect
-identifies another function that is to be used to combine multiple chunk results into
-the final result of the Reduction expression. The multiple results are combined two at
-a time.
-
-The initial value specified by the programmer must match (be confirming to) the
-value of the Identity aspect associated with the Reducer function for the Reduction
-Expression, if the Reducer function has the aspect. The Identity aspect of a function
-identifies a value of the same type as the function result. If the Identity aspect is not
-associated with the Reducer function, then the Initial value specified by the
-programmer is assumed to be correct.
-
-The Reducer function is expected to generate an associative result based on the input
-parameters. A Reducer function does not need to be commutative (e.g. vector concatenation),
-as it is expected that the implementation will ensure that the results are combined in
-a manner that is consistent with sequentially applying the Reducer function to the
-chunk results in iteration order. If the parallel keyword is not present in the Reduction
-Expression, then sequential computation is assumed and the Reducer aspect does not need to
-be specified for the combiner function.
-
-For parallel Reductions Expressions, it is important that the value of the Identity
-aspect associated with a function does not affect the result of the computation. For
-example, if the reduction result type is Integer and the reducer is addition, then the
-Identity value should be zero, since adding zero to any value does not affect the value.
-Similarly, if the reducer is multiplication then the Identity value should be one, since
-multiplying any Integer value by 1 does not affect the value. If the result
-type is a vector and the reducer is concatenation, then the Identity value should
-be an empty vector, since concatenating an empty vector to another vector does
-not affect the value of the other vector.
+Reduction Expression that applies an operation, the Predicate, to a set of
+values to reduce the predicate results to a single Boolean result. Similarly, an
+array aggregate in the form of an iterated_component_association can also be
+viewed as being a special purpose Reduction Expression that applies an
+operation, in place concatenation, to a set of values to reduce into an array
+result.
+
+A Reduction expression is a more general syntactic form where there is
+less constraint on the type of operation to be applied to the set of
+values, and the operation can produce result values of types other than
+Boolean values, or array creation.
+
+Semantics: A Reduction Expression looks like a quantified expression,
+except that the quantifier keyword ("all" or "some") is not present, and
+the predicate expression is replaced with an expression that evaluates
+to a function call returning an object of a nonlimited type, that is
+called the combiner_function_call.
+
+The combiner function call must have at least one parameter of the same
+type as the type of the Reduction expression. The combiner function call
+is called iteratively to accumulate a result, which ultimately becomes
+the result of the Reduction Expression. The accumulation of the result
+occurs as the result of each iteration is implicitly fed back as an
+input parameter to the combiner function call. The implicit parameter is
+identified syntactically based on the "<>" box notation syntax, except
+that the box notation can optionally enclose the initial value to be
+used for the first call to the function, since at that point there would
+otherwise be no existing value to pass into the function. The initial
+value also serves to be the result of the reduction expression for the
+case when there are no iterations performed (E.g. For the case when a
+when a loop_parameter_specification for a Reduction Expression specifies
+a null range).
+
+If the initial value is not specified, then the initial value is assumed
+to be a special value called the identity value that is associated with
+the combiner function call. The identity value is specified by either
+applying an Identity aspect to the declaration of the function denoted
+by the combiner function call, or indirectly to another function called
+a reducer function that is similarly associated with the combiner
+function call via the use of a Reducer aspect applied to the function
+denoted by the combiner function call. The Identity aspect of a function
+identifies a value of the same type as the combiner function call result.
+If the combiner_function_call box parameter does not specify an initial
+value and the combiner_function_call is not associated with an identity
+value, then the compilation is illegal.
+
+To indicate that a Reduction Expression is a candidate for
+parallelization, the reserved word "parallel" may be inserted
+immediately before the reserved word "for". Similar to the way Parallel
+Loops are broken into chunks, a Parallel Reduction expression will also
+have chunking applied, where each chunk is processed sequentially, but
+potentially in parallel with other iteration chunks of the expression.
+
+For each chunk, an accumulator result is generated that is local to the
+tasklet. The local accumulator result for each chunk is initialized to
+the identity value, except the first chunk is initialized to the initial
+value if specified, otherwise the first chunk is also initialized to the
+identity value. As the chunk results are calculated in parallel, the
+result of the Reduction Expression is generated by combining/reducing
+the final function results of each chunk by applying a Reducer function
+for the Reduction Expression. A parallel reduction expression is illegal
+if the combiner function call is not associated with an identity value,
+regardless whether an initial value is specified in the box parameter or
+not.
+
+The Reducer function for a parallel Reduction expression is a function
+that accepts two parameters where both parameters are of the same type,
+and returns a result of that same type, which is also the type of the
+Reduction expression.
+
+The Combiner function must have a Reducer aspect specified for
+its declaration if the function itself is not a Reducer function. The
+Reducer aspect identifies another function that is to be used to combine
+multiple chunk results into the final result of the Reduction expression.
+The multiple results are combined two at a time.
+
+The Reducer function is expected to generate an associative result based on the
+input parameters. A Reducer function does not need to be commutative (e.g.
+vector concatenation), as it is expected that the implementation will ensure
+that the results are combined in a manner that is consistent with sequentially
+applying the Reducer function to the chunk results in iteration order. If the
+parallel keyword is not present in the Reduction Expression, then sequential
+computation is assumed and the Reducer aspect does not need to be specified for
+the function declaration denoted by the combiner function call.
+
+For parallel Reductions Expressions, it is important that the value of the
+Identity aspect associated with a function does not affect the result of the
+computation. For example, if the reduction result type is Integer and the
+reducer is addition, then the Identity value should be zero, since adding zero
+to any value does not affect the value. Similarly, if the reducer is
+multiplication then the Identity value should be one, since multiplying any
+Integer value by 1 does not affect the value. If the result type is a vector and
+the reducer is concatenation, then the Identity value should be an empty vector,
+since concatenating an empty vector to another vector does not affect the value
+of the other vector.
 
 Note that the same rules presented for parallel blocks above apply to
-the update of shared variables, and for this purpose each iteration (or chunk) is
-treated as equivalent to a separate sequence of a parallel block.
+the update of shared variables, and for this purpose each iteration (or chunk)
+is treated as being a separate tasklet.
+is treated as equivalent to a separate sequence of a parallel block.
 
 !wording
 
 Append to Introduction (28)
-"A parallel block statement requests that two or more sequences of
- statements should execute in parallel with each other."
+"A concurrent block statement requests that two or more sequences of
+ statements should execute concurrently with each other, and in parallel
+ with each other if the parallel keyword is specified."
+
+Modify 2.9(2/3)
+Add "parallel" to the list of reserved words.
+
+Modify 4.3.3 (5.1/5)
 
+iterated_component_association ::= [parallel] for defining_identifier in discrete_choice_list => expression
+
 Modify 4.4(1/3)
 
 "In this International Standard, the term "expression" refers
@@ -246,7 +293,7 @@
 Modify 4.4(7/3)
 
 "primary ::=
-   numeric_literal | null | string_literal | aggregate {| <reduction_identity>}
+   numeric_literal | null | string_literal | aggregate
  | name | allocator | (expression)
  | (conditional_expression) | (quantified_expression) {| (reduction_expression)}"
 
@@ -280,12 +327,13 @@
 Add AARM note
 
 "The Identity aspect for the "and" function for modular types is not specified
- because it is difficult to specify a value that would work for all types.
- eg. type X is mod 97; We could say that it is T'Last for types where the modulus
- is a power of 2, for example, but that is not general enough, similarly, we could
- say that the Identity for the "and" function for a one-dimensional array is
- T'(others => True), but that only works if T is a constrained array type, so we
- dont bother trying to specify the Identity aspect for these operations."
+ because it is difficult to statically specify a value that would work for all
+ types. eg. type X is mod 97; We could say that it is T'Last for types where the
+ modulus is a power of 2, for example, but that is not general enough,
+ similarly, we could say that the Identity for the "and" function for a
+ one-dimensional array is T'(others => True), but that only works if T is a
+ constrained array type, so we dont bother trying to specify the Identity aspect
+ for these operations."
 
 Modify 4.5.3(2)
 
@@ -316,85 +364,210 @@
  function "/"(Left, Right : universal_fixed) return universal_fixed{ with Reducer => "*"
 "
 
+Modify 4.5.8(1/3)
+
+quantified_expression ::= [parallel] for quantifier loop_parameter_specification => predicate
+  | [parallel] for quantifier iterator_specification => predicate
+
+Modify 4.5.8 (6/4)
+
+For the evaluation of a quantified_expression, the
+loop_parameter_specification or iterator_specification is first
+elaborated. The evaluation of a quantified_expression then evaluates the
+predicate for the values each value of the loop parameter. {If the
+parallel keyword is not specified, these}[These] values are examined in
+the order specified by the loop_parameter_specification (see 5.5) or
+iterator_specification (see 5.5.2). {Otherwise these values are examined
+in an arbitary order consistent with parallel execution.}
+
 Add Section 4.5.9
 
 "Reduction Expressions"
 
-Reduction expressions provide a way to write a reduction that combines a set of values
-into a single result.
+Reduction expressions provide a way to write a reduction that combines a set of
+values into a single result.
+
 Syntax
 
- reduction_identity ::= primary
+ reducer_function ::= function_specification
 
- reduction_expression ::= for loop_parameter_specification => combiner_function_call
-  | for iterator_specification => combiner_function_call
+ reduction_expression ::=
+    [parallel] for loop_parameter_specification => combiner_function_call
+  | [parallel] for iterator_specification => combiner_function_call
 
 combiner_function_call ::= function_call
-
-Wherever the Syntax Rules allow an expression, a reduction_expression may be used in
-place of the expression, so long as it is immediately surrounded by parentheses.
 
-Discussion: The syntactic category reduction_expression appears only as a primary that
-is parenthesized. The above rule allows it to additionally be used in other contexts
-where it would be directly surrounded by parentheses. This is the same rule that is
-used for conditional_expressions; see 4.5.7 for a detailed discussion of the meaning and
-effects of this rule.
+Wherever the Syntax Rules allow an expression, a reduction_expression may be
+used in place of the expression, so long as it is immediately surrounded by
+parentheses.
+
+Discussion: The syntactic category reduction_expression appears only as a
+primary that is parenthesized. The above rule allows it to additionally be used
+in other contexts where it would be directly surrounded by parentheses. This is
+the same rule that is used for conditional_expressions; see 4.5.7 for a detailed
+discussion of the meaning and effects of this rule.
 
 Name Resolution Rules
 
-The expected type of a reduction_expression is any constrained nonlimited type. The
-combiner_function_call in a reduction_expression is expected to return a result of the
-same type. The type of a reduction_identity included in the combiner_function_call is
-expected to be of the same type as the reduction_expression.
+The expected type of a reduction_expression is any nonlimited type. The
+combiner_function_call in a reduction_expression is expected to return a
+result of that same type.
+
+A reducer_function is a function that has exactly two parameters where
+both formal parameters and the result are of the same type. A
+reducer_function is called implicitly to combine results from
+multiple executions of the combiner_function_call of a reduction
+expression when the reduction expression has parallel execution. The
+function denoted by a combiner_function_call can be associated with a
+reducer_function. If such an association exists, either the function is
+itself a reducer_function or its declaration has the Reducer aspect
+specified which indicates the associated reducer_function.
+
+An identity_value is a value that can be used to initialize implicit
+declarations of variables that accumulate the results of a
+reduction_expression. These accumulator variables are passed as the
+actual parameter associated with the reduction_expression_parameter of a
+combiner_function_call. The identity_value for the function denoted by a
+combiner_function_call is determined from either; the Identity aspect
+specified on the declaration of the denoted function, or by the Identity
+aspect specified on the declaration of another function named by the the
+Reducer aspect of the denoted function.
 
 Legality Rules
 
-The combiner_function_call of a reduction_expression shall include a function call that
-has an explicit_actual_parameter that is a reduction_identity.
+The combiner_function_call of a reduction_expression shall have exactly one
+explicit_actual_parameter that is a reduction_expression_parameter (See 6.4(6)).
 
-The combiner_function_call of a reduction_expression shall have a Reducer_Aspect if the
-reduction_expression includes the keyword parallel.
+If the parallel keyword is specified on the reduction_expression then
+the combiner_function_call shall either denote a function that is a
+reducer_function, or the denoted function shall have the Reducer aspect
+specified on its declaration.
+
+If the parallel keyword is specified on the reduction_expression or the
+reduction_expression_parameter of the combiner_function_call does not
+specify an initial_reduction_value then the combiner_function_call shall
+either denote a function that has the Identity aspect specified on its
+declaration, or the denoted function shall have the Reducer aspect
+specified on its declaration and the function named by the Reducer
+aspect shall have the Identity aspect specified on its declaration.
+
+The type of an initial_reduction_value or associated identity_value
+specified for a reduction_expression_parameter shall be of the same type
+as the reduction_expression.
+
+The result type and the type of both formal parameters of an associated
+reducer_function shall be of the same type as the reduction_expression.
+
+Static Semantics
 
-The value of a reduction_identity associated with a combiner_function_call shall be
-confirming with the Identity aspect of the function designated by the Reducer_Aspect of
-the combiner_function_call if the designated function has the Identity aspect.
+ For a function_specification, the following language-defined
+ operational aspects may be specified with an aspect_specification
+ (see 13.1.1):
 
+  Identity
+
+The aspect Identity denotes a value that is to specified as the
+identity_value associated with a Reduction_Expression. The aspect shall
+be specified by a static expression, and that expression shall be
+explicit, even if the aspect has a boolean type.
+
+Identity shall be specified only on a function_specification
+declaration.
+
+Reason: The part about requiring an explicit expression is to disallow
+omitting the value for this aspect, which would otherwise be allowed by
+the rules of 13.1.1.
+
+Aspect Description for Identity: Identity value for a function.
+
+The expected type for the expression specified for the Identity aspect
+is the result type of the function_specification declaration on which it
+appears.
+
+Only one Identity aspect may be applied to a single function declaration;
+
+
+  Reducer
+
+The aspect Reducer shall denote a reducer_function
+that is to be associated with a declared function.
+
+Reducer The aspect Reducer denotes a function with the following
+specification:
+
+ function reducer(L, R : in result_type) return result_type
+
+where result_type statically matches the subtype of the result of the
+declared function.
+
+Only one Reducer aspect may be applied to a single function declaration;
+
+Aspect Description for Reducer: Reducer function associated with the
+combiner_function_call of a reduction expression.
+
 Dynamic Semantics
 
-For the evaluation of a reduction_expression, the loop_parameter_specification or
-iterator_specification is first elaborated. The evaluation of a reduction_expression
-then evaluates the combiner_function_call for the values of the loop parameter in the
-order specified by the loop_parameter_specification (see 5.5) or iterator_specification
-(see 5.5.2).
-
-The value of the reduction_expression is determined as follows:
-For the first iteration, the value of the reduction_identity parameter is the value of
-the primary of the reduction_identity. For subsequent iterations, the value of the
-reduction_identity parameter is that of the result of the combiner_function_call for the
-previous iteration. If the Reduction_Expression does not evaluate any iterations, then
-the value of the Reduction_Expression is the value of the primary of the reduction_identity.
+For the evaluation of a reduction_expression, the loop_parameter_specification
+or iterator_specification is first elaborated, and accumulator variables of the
+type of the reduction_expression are implicitly declared. Each accumulator
+variable corresponds to a unique non-overlapping subset of the iterations to be
+performed where all accumulators together cover the full set of iterations to be
+performed. For the accumulator dedicated to the first iterations, the
+accumulator is initialized to the initial_reduction_value, if specified,
+otherwise it is initialized to the associated identity_value of the
+combining_function_call. Any other accumulator values are initialized to the
+associated identity value of the combining_function_call.
+
+The evaluation of a reduction_expression then evaluates the
+combiner_function_call for the values of the loop parameter in the order
+specified by the loop_parameter_specification (see 5.5) or
+iterator_specification (see 5.5.2) within each iteration subset.
+For each subsequent evaluation of the combiner_function_call within an
+iteration subset, the result from the previous evaluation is
+passed as an actual parameter to the reduction_expression_parameter of
+the current evaluation.
+
+The value of the reduction_expression is determined as follows: As accumulator
+results are determined they are reduced into a single result two at a time by
+implicit calls to the reducer_function associated with the
+combiner_function_call. The accumulator results passed to the reducer_function
+are always for two adjacent iteration subsets where the result for the lower
+iteration subset is passed as the first actual parameter to the reducer_function
+and the result for the higher iteration subset is passed as the second actual
+parameter to the reducer_function. If the Reduction_Expression does not evaluate
+any iterations, then the value of the Reduction_Expression is the
+initial_reduction_value is specified, otherwise it is the identity_value
+associated with the combiner_function_call.
+
+Notes: The accumulator results must represent adjacent iteration subsets
+as described above to ensure that non-commutative reductions will
+produce consistent results for parallel execution.
 
 Examples
 
   A reduction expression to calculate the sum of elements of an array
 
-  (for parallel Element of Arr => <0> + Element)
+  (parallel for Element of Arr => <0> + Element)
 
   A reduction expression to calculate the minimum value of an array
 
-  (for parallel X of Arr => Integer'Min(<Integer'Last>,  X))
+  (parallel for X of Arr => Integer'Min(<Integer'Last>,  X))
 
   A reduction expression to create an unbounded string containing the alphabet
 
   (for Letter in 'A' .. 'Z' => <Null_Unbounded_String> & Letter)
 
-  A reduction expression to determine how many people in a database are 30 something
+  A reduction expression to create a string containing the alphabet
+
+  (for Letter in 'A' .. 'Z' => <""> & Letter)
+
+  A reduction expression to determine how many people in a database are 30 or older
    ThirtySomething : contant Natural :=
-     (for P of parallel Personel => <0> + (if Age(P) > 30 then 1 else 0));
+     (parallel for P of Personel => <0> + (if Age(P) > 30 then 1 else 0));
 
   An expression function that returns its result as a Reduction Expression
 
-   function Fact(N : Natural) return Natural is (for J in 1..N => <1> * J);
+   function Factorial(N : Natural) return Natural is (for J in 1..N => <1> * J);
 
   An expression function that computes the Sin of X using Taylor expansion
    function Sin(X : Float; Num_Terms : Positive := 5) return Float is
@@ -415,156 +588,191 @@
    Step : constant := 1.0 / Number_Of_Steps;
 
    Pi : constant Long_Float := Step *
-     (for I in parallel 1 .. Number_Of_Steps =>
+     (parallel for I in 1 .. Number_Of_Steps =>
          <0.0> + (4.0 / (1.0 + ((Long_Float (I) - 0.5) * Step)**2)));
 
 Wording Changes from Ada 2012
 Reduction expressions are new.
-
-Modify 5.5(4)
-Loop_parameter_specification ::=
-   defining_identifier in [reverse {[parallel]}] discrete_subtype_definition
 
-(Static Semantics)
+Modify 5.5(3/3)
 
-A reduction_list specifies that each declaration denoted by each name given
- in the list has the Reducer aspect (see 9.12.xx)
+Iteration_scheme ::= while condition
+   | [parallel] for loop_parameter_specification
+   | [parallel] for iterator_specification
+
+Modify 5.5 (6/5)
+
+A loop_parameter_specification declares a {set of }loop parameter{s},
+{where eash loop parameter corresponds to a unique non-overlapphing
+subset of iterations that together cover the full set of iterations to
+be performed.}
+[which is an ]{These} object{'s} [whose ]subtype{s} (and nominal subtype{s})
+[is]{are} that defined by the discrete_subtype_definition.
 
 Modify to 5.5(9/4)
 
 "For the execution of a loop_statement with the iteration_scheme being for
- loop_parameter_specification, the loop_parameter_specification is first elaborated.
- This elaboration creates the loop parameter{s} and elaborates the
- discrete_subtype_definition. {Multiple loop parameters may be created where each loop
- parameter is associated with a non-overlapping range of the iterations, if the keyword
- parallel is present, otherwise a single loop parameter is assumed. Each loop parameter
- can execute concurrently with other loop parameters of the same loop. Each thread of
- control proceeds independently and concurrently between the points where they interact
- with other tasks and with each other.} If the discrete_subtype_definition defines a
- subtype with a null range, the execution of the loop_statement is complete. Otherwise,
- the sequence_of_statements is executed once for each value of the discrete subtype
- defined by the discrete_subtype_definition that satisfies the predicates of the subtype
- (or until the loop is left as a consequence of a transfer of control). Prior to each
- such iteration, the corresponding value of the discrete subtype is assigned to the loop
- parameter. These values are assigned in increasing order unless the reserved word
- reverse is present, in which case the values are assigned in decreasing order.
-"
+ loop_parameter_specification, the loop_parameter_specification is first
+ elaborated. This elaboration creates the loop parameter{s} and elaborates the
+ discrete_subtype_definition. {Multiple loop parameters may be created where
+ each loop parameter is associated with a unique non-overlapping range of the
+ iterations if the keyword parallel is present, otherwise a single loop
+ parameter is assumed. Each loop parameter can execute concurrently with other
+ loop parameters of the same loop. Each thread of control proceeds independently
+ and concurrently between the points where they interact with other tasks and
+ with each other.} If the discrete_subtype_definition defines a subtype with a
+ null range, the execution of the loop_statement is complete. Otherwise, the
+ sequence_of_statements is executed once for each value of the discrete subtype
+ defined by the discrete_subtype_definition that satisfies the predicates of the
+ subtype (or until the loop is left as a consequence of a transfer of control).
+ Prior to each such iteration, the corresponding value of the discrete subtype
+ is assigned to the loop parameter. These values are assigned in increasing
+ order unless the reserved word reverse is present, in which case the values are
+ assigned in decreasing order. "
 
 AARM - An implementation should statically treat the
 sequence_of_statements as being executed by separate threads of control,
 but whether they actually execute in parallel or sequentially should be a
-determination that is made dynamically at run time, dependent on factors such
-as the available computing resources.
+determination that is made dynamically at run time, dependent on factors
+such as the available computing resources.
 
 Examples after 5.5(20)
 
-Example of a parallel loop without reduction
+Example of a parallel loop
 
-for I in parallel Buffer'Range loop
+parallel
+for I in Buffer'Range loop
    Buffer(I) := Arr1(I) + Arr2(I);
 end loop;
 
 
-"5.6.1 Parallel Block Statements
+"5.6.1 Concurrent Block Statements
 
-[A parallel_block_statement encloses two or more sequence_of_statements
-where all the sequence_of_statements can execute in parallel with each
-other.]
+[A concurrent_block_statement encloses two or more sequence_of_statements
+where all the sequence_of_statements can execute concurrently or possibly
+in parallel with each other.]
 
 Syntax
 
-parallel_block_statement ::=
-    parallel
+concurrent_block_statement ::=
+    [parallel] do
       sequence_of_statements
     and
       sequence_of_statements
    {and
       sequence_of_statements}
-    end parallel;
+    end do;
 
 Static Semantics
 
 Each sequence_of_statements represents a separate thread of control that
 proceeds independently and concurrently between the points where they
-interact with other tasks and with each other.
+interact with other tasks and with each other. If the parallel keyword
+is present, then each threads of control can execute on any available
+processor in parallel with other threads. Otherwise, the threads of
+control execute concurrently on the same processor as the enclosing task.
 
 AARM - An implementation should statically treat each
 sequence_of_statements as a separate thread of control, but whether they
-actually execute in parallel or sequentially should be a determination
+actually execute concurrently or sequentially should be a determination
 that is made dynamically at run time, dependent on factors such as the
 available computing resources.
 
 Examples
 
-Example of a parallel block statement:
+Example of a concurrent block statement:
 
-   parallel
+   do
      Foo(Z);
    and
      Bar(X, Y);
    and
      Put_Line ("Executing Foo and Bar in parallel with Other_Work");
      Other_Work;
-   end parallel;
+   end do;
 
+Modify 6.4(6)
+"explicit_actual_parameter ::= expression | variable_name {| reduction_expression_parameter}"
 
-Add C.7.1 (5.1)
+reduction_expression_parameter ::= <[initial_reduction_value]>
 
-The Task_Id value associated with each sequence_of_statements of a
-parallel_block_statement or of a loop statement is the same as that of the
-enclosing statement.
+initial_reduction_value ::= simple_expression
+
+
+Add 6.4(7.1) Legality Rules
 
-AARM - Each sequence_of_statements of a parallel block or parallel loop are
-treated as though they are all executing as the task that encountered the
-parallel block or parallel loop statement.
+A reduction_expression_parameter shall only be supplied as an actual
+parameter to a combiner_function_call of a reduction expression
 
 Change 9.10 (13)
 
 "Both actions occur as part of the execution of the same task {unless
-they are each part of a different sequence_of_statements of a parallel
-block statement, parallel loop statement, or parallel reduction expression.}"
+ either are part of a;
+   - different sequence_of_statements of a concurrent
+   - block statement,
+   - parallel loop statement,
+   - parallel quantified expression,
+   - parallel array aggregate, or
+   - parallel reduction expression.}"
 
 New section 9.12 Executors and Tasklets
 
 A task may distribute execution across different physical processors in
-parallel, where each execution is a separate thread of control that proceeds
-independently and concurrently between the points where they
-interact with other tasks and with each other. Each separate thread of control
-of a task is an Executor, and the execution that each executor performs between
-synchronization is a tasklet. When a task distributes its execution to a set
-of executors, it cannot proceed with its own execution until all the executors
-have completed their respective executions.
-
-A parallel block statement, parallel loop statement, or parallel reduction expression
-may assign a set of executors to execute the construct, if extra computing resources
-are available.
+parallel, where each execution is a separate thread of control that
+proceeds independently and concurrently between the points where they
+interact with other tasks and with each other. Each separate thread of
+control of a task is an Executor, and the execution that each executor
+performs between synchronization is a tasklet. When a task distributes
+its execution to a set of executors, it cannot proceed with its own
+execution until all the executors have completed their respective
+executions.
 
-For a function declaration, the following language-defined representation aspects may
-be specified:
+A concurrent block statement, parallel loop statement, parallel
+quantified expression, parallel aggregate, or parallel reduction
+expression may assign a set of executors to execute the construct, if
+extra computing resources are available.
 
-Reducer The aspect Reducer denotes a function with the following
-specification:
+Modify A.1 (9.1)
 
- function reducer(L, R : in result_type) return result_type
+   -- function "and" (Left, Right : Boolean'Base) return Boolean'Base {with Identity => True};
+   -- function "or"  (Left, Right : Boolean'Base) return Boolean'Base {with Identity => False};
+   -- function "xor" (Left, Right : Boolean'Base) return Boolean'Base {with Identity => False};
 
-where result_type statically matches the subtype of the type declaration or
-subtype declaration.
+Modify A.1 (17)
 
-Only one Reducer aspect may be applied to a single function declaration;
+   -- function "+"   (Left, Right : Integer'Base) return Integer'Base {with Identity => 0};
+   -- function "-"   (Left, Right : Integer'Base) return Integer'Base {with Reducer => "+"};
+   -- function "*"   (Left, Right : Integer'Base) return Integer'Base {with Identity => 1};
+   -- function "/"   (Left, Right : Integer'Base) return Integer'Base {with Reducer => "*"};
 
-Identity The aspect Identity denotes a value that is to specified as the Identity_Value
-in a Reduction_Expression if the Combining_Function_Call of the Reduction_Expression
-either has the Identity aspect, or has a Reducer aspect that designates a function that
-has the Identity aspect. An Identity aspect can only be specified for a function
-declaration that has the following form;
+Modify A.1 (25)
 
- function reducer(L, R : in result_type) return result_type
+   -- function "+"   (Left, Right : Float) return Float {with Identity => 0.0};
+   -- function "-"   (Left, Right : Float) return Float {with Reducer  => "+"};
+   -- function "*"   (Left, Right : Float) return Float {with Identity => 1.0};
+   -- function "/"   (Left, Right : Float) return Float {with Reducer  => "*"};
 
-where result_type statically matches the subtype of the type declaration or
-subtype declaration.
+Modify A.1 (29-34)
 
-Only one Identity aspect may be applied to a single function declaration;
+function "*" (Left : root_integer; Right : root_real)
+     return root_real {with Identity => 1.0};
+
+   function "*" (Left : root_real;    Right : root_integer)
+     return root_real {with Identity => 1.0};
+
+   function "/" (Left : root_real;    Right : root_integer)
+     return root_real {with Reducer => "*"};
 
+   -- The type universal_fixed is predefined.
+   -- The only multiplying operators defined between
+   -- fixed point types are
+
+   function "*" (Left : universal_fixed; Right : universal_fixed)
+     return universal_fixed {with Identity => 1.0};
+
+   function "/" (Left : universal_fixed; Right : universal_fixed)
+     return universal_fixed {with Reducer => "*"};
+
 Modify A.4.4 (13-17)
 
  function Append (Left, Right : in Bounded_String;
@@ -689,6 +897,18 @@
     { with Identity => Empty_Set};
 
 
+Add C.7.1 (5.1)
+
+The Task_Id value associated with each sequence_of_statements of a
+concurrent_block_statement, parallel quantified expression, parallel
+reduction expression, parallel array aggregate, or of a parallel loop
+statement is the same as that of the enclosing statement.
+
+AARM - Each sequence_of_statements of a concurrent block, parallel
+quantified expression, parallel reduction expression, parallel array
+aggregate, or parallel loop are treated as though they are all executing
+as the task that encountered the parallel construct.
+
 !discussion
 
 There is a continuing trend of exponential growth of computational
@@ -720,7 +940,7 @@
 annotations identifying global variable usage on subprogram
 specifications (see AI12-0079-1).
 
-Parallel Blocks
+Concurrent Blocks
 ---------------
 
 example:
@@ -730,11 +950,12 @@
       begin
 
          parallel
+         do
             X := Foo(100);
          and
             Z := Sqrt(3.14) / 2.0;
             Y := Bar(Z);
-         end parallel;
+         end do;
 
          Put_Line("X + Y=" & Integer'Image(X + Y));
       end;
@@ -745,8 +966,8 @@
 complain if the parallel sequences might have conflicting global
 side-effects.
 
-The parallel block construct is flexible enough to support recursive
-usage as well, such as:
+The concurrent block construct is flexible enough to support recursive usage as
+well, such as:
 
    function Fibonacci (N : Natural) return Natural is
       X, Y : Natural;
@@ -756,10 +977,11 @@
       end if;
 
       parallel
+      do
         X := Fibonacci (N - 2);
       and
         Y := Fibonacci (N - 1);
-      end parallel;
+      end do;
 
       return X + Y;
    exception
@@ -767,7 +989,7 @@
          Log ("Unexpected Error");
    end Fibonacci;
 
-We considered allowing the parallel block to be preceded with an
+We considered allowing the concurrent block to be preceded with an
 optional declare part, and followed with optional exception handlers,
 but it was observed that it was more likely to be useful to have objects
 that are shared across multiple parallel sequences to outlive the
@@ -777,13 +999,25 @@
 above. This simpler syntax is also more congruous with the syntax for
 select statements. Because there are no local declarations, there was
 also no point in having a statement_identifier (block label) for a
-parallel block. This is actually not entirely true. Allowing an exit
+concurrent block. This is actually not entirely true. Allowing an exit
 statement to replace a goto that targets the location following the
 end of the select statement seems useful. To allow such a statement,
-a block label might be needed to identify the parallel block that is
+a block label might be needed to identify the concurrent block that is
 being exited. It was felt that the need for allowing an exit statement
-in a parallel block could be the subject of a separate AI.
+in a concurrent block could be the subject of a separate AI.
 
+We considered whether a concurrent block statement without the parallel
+keyword should be sequential instead of concurrent. A concurrent
+construct seems more useful as it allows for abstractions such as
+coroutines where producers and consumers produce and consume data that
+is shared between tasklets. Such usage would work whether the pararallel
+keyword is present or not in our model, but if instead the construct
+was sequential, then removing the parallel keyword from a working
+program could cause the application to deadlock, which is a safety concern.
+Also, there is not much point to a sequential block abstraction, since
+one could simply remove the construct and execute all the sequences
+sequentially.
+
 Parallel Loops
 --------------
 
@@ -813,59 +1047,64 @@
 For example, here is a simple use of a parallelized loop, to add two
 arrays together to produce a result array:
 
-  for I in parallel Loop_Iterations'Range loop
+  parallel
+  for I in Loop_Iterations'Range loop
     Result (I)  := A(I) + B(I);
   end loop;
 
 Note that the compiler, using the rules specified in AI12-0079-1, may
 complain if the parallel sequences might have conflicting global
 side-effects. We considered extending the syntax to allow reductions to be
-performed with parallel loops, but we discovered that parallel reduction expressions
-can provide the same capability more concisely, so for now we allow parallel loops
-to work if there are no global conflicts. A separate AI could be created to add
-reduction loop capability if it is decided that this is needed. In the meantime,
-Reduction Expressions appear to better fill this need. The main purpose of a loop is
-to apply processing iteratively. The main purpose of a reduction however, is to generate
-a result value. Iteration is a means to achieve that end, but a reduction expression seems
-better suited since it is designed to produce a result value.
+performed with parallel loops, but we discovered that parallel reduction
+expressions can provide the same capability more concisely, so for now we allow
+parallel loops to work if there are no global conflicts. A separate AI could be
+created to add reduction loop capability if it is decided that this is needed.
+In the meantime, Reduction Expressions appear to better fill this need. The main
+purpose of a loop is to apply processing iteratively. The main purpose of a
+reduction however, is to generate a result value. Iteration is a means to
+achieve that end, but a reduction expression seems better suited since it is
+designed to produce a result value.
 
 Reduction Expressions
 ---------------------
 
-A reduction expression provides a concise way to use iteration to combine a set of values
-into a single result.
+A reduction expression provides a concise way to use iteration to combine a set
+of values into a single result.
 
-We already have some special purpose reduction expressions in Ada in the form of quantified
-expressions.
+We already have some special purpose reduction expressions in Ada in the form of
+quantified expressions and array aggregates of the form using an
+iterated_component_association.
 
 Consider:
 
    All_Graduated : constant Boolean :=
-     (for all Student of parallel Class => Passed_Exam (Student));
+     (parallel for all Student of Class => Passed_Exam (Student));
 
    Someone_Graduated : constant Boolean :=
-     (for some Student of parallel Class => Passed_Exam (Student));
+     (parallel for some Student of Class => Passed_Exam (Student));
 
 
-Note that the keyword parallel would now be allowed in quantified expressions since
-for loops and quantified expressions have common syntax for loop parameter specifications.
+Note that the keyword parallel would now be allowed in quantified
+expressions.
 
 Here we are in both cases effectively reducing a set of Boolean values into a
 single result using quantified expressions.
 
-A similar effect can be written using the more generalized Reduction expression syntax.
+A similar effect can be written using the more generalized Reduction expression
+syntax.
 
    All_Graduated : constant Boolean :=
-     (for Student of parallel Class => <True> and Passed_Exam (Student));
+     (parallel for Student of Class => <True> and Passed_Exam (Student));
 
    Someone_Graduated : constant Boolean :=
-     (for Student of parallel Class => <False> or Passed_Exam (Student));
+     (parallel for Student of Class => <False> or Passed_Exam (Student));
 
-Here we use "and" and "or" as the combining_function_call of the Reduction expression.
-The initial value plays an important role, since the Identity for "and" must be true,
-and false for "or".
+Here we use "and" and "or" as the combining_function_call of the Reduction
+expression. The initial value plays an important role, since the Identity for
+"and" must be true, and false for "or".
 
-Another concern might be whether parallel loops can generally be written as expressions.
+Another concern might be whether parallel loops can generally be written as
+expressions.
 
 Consider the calculation of Pi:
 
@@ -873,7 +1112,7 @@
    Step : constant := 1.0 / Number_Of_Steps;
 
    Pi : constant Long_Float := Step *
-     (for I in parallel 1 .. Number_Of_Steps =>
+     (parallel for I in 1 .. Number_Of_Steps =>
          <0.0> + (4.0 / (1.0 + ((Long_Float (I) - 0.5) * Step)**2)));
 
 One might feel that the readability of this example might be improved if
@@ -897,26 +1136,26 @@
    end Pi_Slicer;
 
    Pi : constant Long_Float := Step *
-     (for I in parallel 1 .. Number_Of_Steps => Pi_Slicer (<0.0>, I));
+     (parallel for I in 1 .. Number_Of_Steps => Pi_Slicer (<0.0>, I));
 
 Which leaves a simpler looking Reduction Expression to do computation.
 
-Another concern might be about the need to perform multiple reductions at
-the same time.
+Another concern might be about the need to perform multiple reductions at the
+same time.
 
 Consider:
 
-   Sum : Integer := (for parallel Element of Arr => <0> + Element);
-   Min : Integer := (for parallel X of Arr => Integer'Min(<Integer'Last>,  X));
-   Max : Integer := (for parallel X of Arr => Integer'Max(<Integer'First>, X));
-
-Here we have three calculations that occur in parallel, but sequentially with respect
-to each other. The performance benefits of parallelism should be noticeable for larger
-arrays, but one might want to calculate these three results iterating only once through
-the array.
+   Sum : Integer := (parallel for X of Arr => <0> + X);
+   Min : Integer := (parallel for X of Arr => Integer'Min(<Integer'Last>,  X));
+   Max : Integer := (parallel for X of Arr => Integer'Max(<Integer'First>, X));
+
+Here we have three calculations that occur in parallel, but sequentially with
+respect to each other. The performance benefits of parallelism should be
+noticeable for larger arrays, but one might want to calculate these three
+results iterating only once through the array.
 
-This can be accomplished by creating a composite result type, and writing a user defined
-reducer function.
+This can be accomplished by creating a composite result type, and writing a user
+defined reducer function.
 
    type Summary is
       record
@@ -2242,7 +2481,8 @@
 > Release_Threshold of the barriers needs to be known in advance. The
 > calculation of each forecast for each city for each day is done in
 > parallel, but presumably, the weather calculations are such that one
-> cannot proceed to the next day until all the weather forecasting is complete for the previous day for all the cities.
+> cannot proceed to the next day until all the weather forecasting is complete
+> for the previous day for all the cities.
 
 I guess I understand the semantics, but it just feels too complicated to me.
 For something like this, two separate parallel loops, or an explicit array of
@@ -2357,9 +2597,17 @@
 >> Release_Threshold of the barriers needs to be known in advance. The
 >> calculation of each forecast for each city for each day is done in
 >> parallel, but presumably, the weather calculations are such that one
->> cannot proceed to the next day until all the weather forecasting is complete for the previous day for all the cities.
+>> cannot proceed to the next day until all the weather forecasting is complete
+>> for the previous day for all the cities.
 >
-> I guess I understand the semantics, but it just feels too complicated to me.  For something like this, two separate parallel loops, or an explicit array of tasks might be more appropriate.  Trying to synchronize across tasklets is going to be complex, a
nd the semantics will be subject to a lot of caveats, I suspect.  Tasklets can generally use run-to-completion semantics, and that clearly won’t work here with barriers in the middle.  If you want a barrier, you can end the parallel loop, and then start a 
new one, which creates a natural, highly visible, easy-to-understand barrier.
+> I guess I understand the semantics, but it just feels too complicated to me.
+> For something like this, two separate parallel loops, or an explicit array
+> of tasks might be more appropriate.  Trying to synchronize across tasklets
+> is going to be complex, and the semantics will be subject to a lot of
+> caveats, I suspect.  Tasklets can generally use run-to-completion semantics,
+> and that clearly won’t work here with barriers in the middle.  If you want a
+> barrier, you can end the parallel loop, and then start a new one, which
+> creates a natural, highly visible, easy-to-understand barrier.
 
 To do this as two separate loops, you need to store much more state. In this
 case, you'd need a two dimensional array and store all the values for all days
@@ -2457,11 +2705,20 @@
 > otherwise let the compiler decide how to deal with such transfer of
 > control.
 
-This is adding yet more complexity to the model.  And as pointed out above, if you have more tasklets than cores, some of them might not be started at all before the “exit."
->
+This is adding yet more complexity to the model.  And as pointed out above,
+if you have more tasklets than cores, some of them might not be started at
+all before the “exit."
+
 >>> ...
 >>>
->> I guess I understand the semantics, but it just feels too complicated to me.  For something like this, two separate parallel loops, or an explicit array of tasks might be more appropriate.  Trying to synchronize across tasklets is going to be complex, 
and the semantics will be subject to a lot of caveats, I suspect.  Tasklets can generally use run-to-completion semantics, and that clearly won’t work here with barriers in the middle.  If you want a barrier, you can end the parallel loop, and then start a
 new one, which creates a natural, highly visible, easy-to-understand barrier.
+>> I guess I understand the semantics, but it just feels too complicated to
+>> me.  For something like this, two separate parallel loops, or an explicit
+>> array of tasks might be more appropriate.  Trying to synchronize across
+>> tasklets is going to be complex, and the semantics will be subject to a lot
+>> of caveats, I suspect.  Tasklets can generally use run-to-completion
+>> semantics, and that clearly won’t work here with barriers in the middle.
+>> If you want a barrier, you can end the parallel loop, and then start a new
+>> one, which creates a natural, highly visible, easy-to-understand barrier.
 >
 > To do this as two separate loops, you need to store much more state.
 
@@ -2545,7 +2802,8 @@
 Sent: Monday, July 11, 2017  9:18 AM
 
 >>>> ...
->>> The semantics for exiting a loop early as currently proposed is that if any iteration invokes an “exit,” all other iterations are abandoned.
+>>> The semantics for exiting a loop early as currently proposed is that if any
+>>> iteration invokes an “exit,” all other iterations are abandoned.
 >>
 >> Not quite, the AI adds this note:
 >>
@@ -2585,7 +2843,14 @@
 
 ...
 >>>>
->>> I guess I understand the semantics, but it just feels too complicated to me.  For something like this, two separate parallel loops, or an explicit array of tasks might be more appropriate.  Trying to synchronize across tasklets is going to be complex,
 and the semantics will be subject to a lot of caveats, I suspect.  Tasklets can generally use run-to-completion semantics, and that clearly won’t work here with barriers in the middle.  If you want a barrier, you can end the parallel loop, and then start 
a new one, which creates a natural, highly visible, easy-to-understand barrier.
+>>> I guess I understand the semantics, but it just feels too complicated to me.
+>>> For something like this, two separate parallel loops, or an explicit array
+>>> of tasks might be more appropriate.  Trying to synchronize across tasklets
+>>> is going to be complex, and the semantics will be subject to a lot of
+>>> caveats, I suspect.  Tasklets can generally use run-to-completion
+>>> semantics, and that clearly won’t work here with barriers in the middle.
+>>> If you want a barrier, you can end the parallel loop, and then start a new
+>>> one, which creates a natural, highly visible, easy-to-understand barrier.
 >>
 >> To do this as two separate loops, you need to store much more state.
 >
@@ -4606,7 +4871,8 @@
 > above, whether the user is driving the "chunking" or the compiler does
 > it by itself, has exactly the same safety-check requirements.
 
-Not exactly, as noted above. But I definitely agree that whatever construct is used needs safety checks.
+Not exactly, as noted above. But I definitely agree that whatever construct is
+used needs safety checks.
 
 > We have been talking with existing Ada customers recently about
 > planned Ada 202X enhancements, and this is the one that gets them
@@ -4988,7 +5254,7 @@
 From: Randy Brukardt
 Sent: Wednesday, August 30, 2017  11:02 PM
 
-> I have some more thoughts towards answering the question on whether we 
+> I have some more thoughts towards answering the question on whether we
 > should eliminate parallel loop
 
 ...
@@ -5075,21 +5341,21 @@
 From: Tucker Taft
 Sent: Thursday, August 31, 2017  3:19 PM
 
->> You just need to prove two slices don't overlap, which is equivalent 
->> to proving the high bound of one is less than the low bound of the 
->> other.  This is the kind of thing that static analysis and more 
->> advanced proof tools are pretty good at doing!  This is not 
->> significantly harder than eliminating run-time checks for array 
->> indexing when the bounds are not known at compile-time, which is 
->> something that many Ada compilers can do based on other information 
+>> You just need to prove two slices don't overlap, which is equivalent
+>> to proving the high bound of one is less than the low bound of the
+>> other.  This is the kind of thing that static analysis and more
+>> advanced proof tools are pretty good at doing!  This is not
+>> significantly harder than eliminating run-time checks for array
+>> indexing when the bounds are not known at compile-time, which is
+>> something that many Ada compilers can do based on other information
 >> available at the point of usage.
-> 
-> OK, but again these are Legality Rules, not something that compilers 
-> are doing on their own. The entire set of rules have to be defined 
-> formally and required of all Ada compilers. What "static analysis" and 
-> "advanced proof tools" can or can't do is irrelevant (unless of course 
-> we allowed what is legal or not to be implementation-defined -- when I 
-> previously proposed that for exception contracts, the idea seemed to 
+>
+> OK, but again these are Legality Rules, not something that compilers
+> are doing on their own. The entire set of rules have to be defined
+> formally and required of all Ada compilers. What "static analysis" and
+> "advanced proof tools" can or can't do is irrelevant (unless of course
+> we allowed what is legal or not to be implementation-defined -- when I
+> previously proposed that for exception contracts, the idea seemed to
 > be as effective as a lead balloon).
 
 This is a general problem with data race detection, I would say.  My suggestion
@@ -5125,5 +5391,479 @@
 generic bodies, we often define things to raise Program_Error under certain
 circumstances, but for compilers that do macro-expansion, almost all of these
 end up as compile-time warnings, which are often treated as errors by the user.
+
+****************************************************************
+
+From: Brad Moore
+Sent: Tuesday, October 10, 2017  12:15 AM
+
+I am trying to write up my homework for AI12-0119-1 (parallel operations),
+with an eye to possibly writing up AI12-0197-4 (Coroutines and Channels),
+and have come across some observations and ideas that I think improves from
+where we left off at Vienna but is modified to account for some new issues.
+
+The basic change is that instead of parallel "begin ... end" for parallel
+blocks, I am planning to substitute "do ... end do".
+
+Eg.
+
+    parallel do
+       Foo;
+    and
+       Bar;
+    end do;
+
+My reasoning for this is;
+
+1) The sequence of statements in each branch should just be a
+  sequence-of-statements, rather than a handled-sequence-of-statements
+  because attempting to put an exception handler in this construct I find
+  is very confusing; I recall now that this is one of the main reasons
+  why the gang-of-four settled on syntax modeled after the select
+  statement rather than a begin statement for parallel blocks.
+
+For example, if one writes;
+
+    parallel begin
+       Foo;
+    and
+       Bar;
+    exception
+       when others =>
+         Put_Line ("Exception Caught!");
+    end;
+
+    I think it is very confusing. Is the exception handler only catching
+    exceptions for the call to Bar, or is it also catching exceptions for
+    the call to Foo? If we say that the exception handlers are not allowed,
+    then it forces the programmer to either enclose the construct in a
+    block statement or equivalent, or use block statements within each arm
+   of the construct.  Either way, I find to be much clearer.
+
+    eg.
+
+     begin
+
+           parallel do
+              Foo;
+           and
+              Bar;
+           end do;
+
+     exception
+        when others =>
+          Put_Line("Exception Caught!");
+     end;
+
+or
+
+       parallel
+       do
+           begin
+              Foo;
+           exception
+              when others =>
+                Put_Line ("Bad Foo!");
+           end;
+
+        and
+
+           begin
+             Bar;
+           exception
+              when others =>
+                 Put_Line ("Bad Bar!");
+           end;
+
+        end do;
+
+The use of "do" has the benefit that "begin" from Vienna, had in that you can
+remove the keyword parallel and still have a working construct.
+The parallel keyword is a modifier. In the previous parallel block proposal
+this would not work as parallel was the name of the construct. If you remove
+"parallel", you have a bunch of code that needs to be cleaned up before it
+will compile.
+
+Secondly, I don't think there is much use in having a declarative region in
+this construct. Generally, you need the results to outlive the construct, so
+that you can do something useful with the results.
+This suggests again that the construct should be enclosed in another construct
+such as a block statement.
+
+eg.
+
+    declare
+       X : Integer;  -- Do Foo & Bar get their own copies of these?
+       Y : Float;
+    parallel
+    begin
+       X := Foo;
+    and
+       Y := Bar;
+    exception  --  Is this handler for both Foo & Bar, or just Bar?
+      when others =>
+        Put_Line("Exception Caught!);
+    end;
+
+    Put_Line (????);  -- !!! I Cant print out the results !!!
+
+In this example, two results have been calculated, but if you want to examine
+both these results at the same time, you cannot, because the scope of the
+results have ended before you can examine both results.
+
+Also, having a declarative region on a parallel begin is confusing. I think some
+users would be confused by this, wondering if each arm of the parallel begin
+gets its own local instances of those declarations, or whether they are shared
+between all the arms.
+
+By eliminating the declarative region, it eliminates this confusion, and also
+not likely to be missed because it is not useful to begin with.
+
+If one instead writes;
+
+declare
+   X : Integer;
+   Y : Float;
+begin
+
+    parallel do
+       X := Foo;
+    and
+       Y := Bar;
+    end do;
+
+    Put_Line ("X=" & Integer'Image(X) & ", Y=" & FLoat'Image(Y));
+
+exception
+    when others =>
+      Put_Line("Exception Caught!");
+end;
+
+I find this to be non-ambiguous, and provides the scope needed to examine both
+results of the arms.
+
+So it becomes clearer to me that this construct is quite different than a
+block statement, and therefore it probably should have its own distinct
+syntax, rather than try to make the block statement accommodate this usage,
+which feels like a hack to me, and also feels like it would be making a bit
+of a mess in the syntax.
+
+Why "do"?  The reserved keyword do is relatively unused in Ada, appearing only
+in the selective accept statement. It seems to fit better than begin in terms
+of English semantics.
+
+The construct really is a list of things to be done.
+   e.g.
+       Do X and Y and Z.
+
+   reads better than a list of things to be begun I think.
+
+       Begin X and Y and Z.
+
+   Begin really is more tied to a declarative region, where you
+   declare a bunch of things, then you need to specify where to "begin"
+   executing. Since this construct doesn't seem to need a declarative
+   region, there no need to indicate where the execution begins.
+
+   Also many syntax constructs in Ada have the form
+      name
+      end name;
+
+      e.g.
+
+      loop
+      end loop;
+
+      if
+      end if;
+
+      select
+      end select;
+
+      case
+      end case;
+
+      record
+      end record;
+
+      If one sees "end" without looking at the above, the assumtion is that it
+      corresponds to a normal block statement.
+
+      If one sees "end do" it provides better feedback that the preceding code
+      has different semantics.
+
+      I suspect that the reason "begin" is not paired with
+      "end begin" mostly because end begin looks like a weird oxymoron
+      that would also be confusing to readers. "end do" does not seem
+      to have this problem.
+
+      Finally, I think the coroutine concept of ai-0197 pretty much fall
+      out of this for free. If the keyword "parallel" is not present,
+      then the semantics could be that each arm gets its own executor,
+      similar to when the parallel keyword is present, but each executor
+      is tied to the same core, thus each arm of the construct executes
+      concurrently, but not in parallel, as only one arm can be executing
+      at one time. If you want more parallelism, simply add the parallel
+      keyword.
+
+      I dont think channels are needed or anything else, one can use
+      existing capabilities in Ada to provide the channels.
+      For example, Ada.Containers.Synchronous_Queues could be used to
+      provide a channel.
+
+      Here is an example of coroutines involving two producers and one
+      consumer.  The first arm is an endless loop that produces integer
+      values, the second arm is a bounded loop that produces higher
+      integer values, and the third arm is the consumer which will end
+      up pulling values from both the other arms.
+
+      The Exit statement of the consumer is used here to terminate the
+      construct, which will abort the endless loop of the 1st arm.
+      There may be reasons why adding "Exit" to block statements would
+      not fit very well with syntax. I suspect there would be less reason
+      to disallow Exit in a do construct.
+
+      Alternatively, this could be a goto, return, or exception, which
+      is treated as a transfer of control out of the construct, which
+      we've already discussed.
+
+declare
+   Queue : Integer_Queues.Queue;
+begin
+
+    do
+       declare
+          X : Integer := 0;
+       begin
+          loop    -- endless loop
+             Integer_Queue.Enqueue (X);
+             X := X + 1;
+          end loop;
+       end;
+
+    and
+
+       for I in 1 .. 3 loop
+          Integer_Queue.Enqueue (1_000_000 + I);
+       end loop;
+
+    and
+
+       declare
+          Value : Integer;
+       begin
+          for I in 1 .. 5 loop
+             Integer_Queue.Dequeue (Value);
+             Put_Line ("Value=" & Integer'Image (Value));
+          end loop;
+
+          Exit; -- or raise exception???
+       end;
+
+    end do;
+
+    I've tried this code in Paraffin, using Ada's asynchronous
+    transfer of control mechanism, and I see the expected results;
+
+    Value= 1000001
+    Value= 1000002
+    Value= 1000003
+    Value= 0
+    Value= 1
+
+The last two values might be output before the first three depending on
+which arm gets to execute first. I think we could go for better determistic
+behaviour and say that the initial order of execution is top down. Once all
+have had their initial start, they proceed in a concurrent fashion dependent
+on the behaviour of the "channel" and the blocking of the arm branches.
+
+    Anyway, I think these are compelling reasons for using "do" rather
+    than "begin", so I will write my homework up that way, unless someone
+    can convince me otherwise before then.
+
+    If we decide to go back to "begin", I dont think it will be a big
+    change to go that way. I just wanted to present these ideas earlier
+    so that it wont be as much a surprise when we meet in Boston,
+    and to possibly receive comment earlier.
+
+****************************************************************
+
+From: Randy Brukardt
+Sent: Tuesday, October 10, 2017  12:42 AM
+
+...
+> I just wanted to present these ideas earlier so that it wont be as
+> much a surprise when we meet in Boston, and to possibly receive
+> comment earlier.
+
+With the homework deadline about 40 hours from now, there isn't much time for
+comment and AI update. And last minute flurries of mail is an issue for me, too
+as it takes time to file e-mail threads. Ergo, at this point, just do it.
+
+(Also, do as I say, not as I do... ;-)
+
+>Finally, I think the coroutine concept of ai-0197 pretty much fall out
+>of this for free. If the keyword "parallel" is not present, then the
+>semantics could be that each arm gets its own executor, similar to when
+>the parallel keyword is present, but each executor is tied to the same
+>core, thus each arm of the construct executes concurrently, but not in
+>parallel, as only one arm can be executing at one time.
+
+OK, but what is providing the "yield" semantics? Without that, it's just stupid
+tasking. And it can't require a lot of work to write, lest it hardly solve
+anything that can't already be solved with an existing task.
+
+> I dont think channels are needed or anything else, one can use
+> existing capabilities in Ada to provide the channels.
+> For example, Ada.Containers.Synchronous_Queues could be used to
+> provide a channel.
+
+I don't think that is what Tucker had in mind -- if normal task communication
+would work, there'd be no need to propose such an idea in the first place. (Nor
+any reason for coroutines, but I digress.) In any case, I want to find out
+exactly what he had in mind, and trying to preempt him is not helpful.
+
+> There may be reasons why adding "Exit" to block statements would not
+> fit very well with syntax. I suspect there would be less reason to
+> disallow Exit in a do construct.
+
+Exit fits fine with the syntax of a block statement. But allowing it is wildly
+incompatible, consider the Ada 95 code:
+
+     loop
+         begin
+             exit when ...;
+         end;
+     end loop;
+
+If block statements had exit, this exit would exit the block, rather than the
+loop. That's clearly not the intention of the writer of the Ada 95 code.
+
+As you note, there's no compatibility problem with "do", but I don't think it is
+a particularly good idea to allow that in one construct but not in a similar
+one. Is this really necessary? It seems like a rare need, and a goto works fine
+(even if clunky).
+
+****************************************************************
+
+From: John Barnes
+Sent: Tuesday, October 10, 2017  1:53 AM
+
+i rather like do.
+
+****************************************************************
+
+From: Tullio Vardanega
+Sent: Tuesday, October 10, 2017  2:25 AM
+
+So do I.
+
+****************************************************************
+
+From: Brad Moore
+Sent: Tuesday, October 10, 2017  9:24 AM
+
+> Ergo, at this point, just
+> do it.
+
+Interesting choice of words. If you had said, just begin it, it would probably
+lower the chances of me getting it done. ;)
+
+> Finally, I think the coroutine concept of ai-0197 pretty much fall
+>> out of this for free. If the keyword "parallel" is not present, then
+>> the semantics could be that each arm gets its own executor, similar
+>> to when the parallel keyword is present, but each executor is tied to
+>> the same core, thus each arm of the construct executes concurrently,
+>> but not in parallel, as only one arm can be executing at one time.
+> OK, but what is providing the "yield" semantics? Without that, it's
+> just stupid tasking. And it can't require a lot of work to write, lest
+> it hardly solve anything that can't already be solved with an existing task.
+
+We could have two subtype attributes, 'Consume and 'Yield allowed only in "do"
+statements that semantically read and write to an implicit buffer or queue of
+that subtype. The scope of the implicit queue would be tied to the scope of the
+do statement. This would be expected to work regardless whether the do statement
+has the parallel keyword or not. The default length of the queue could be 1, but
+perhaps could be controlled by an aspect on the subtype declaration.
+
+e.g.
+
+do
+   for I in 1 .. 1_000_000 loop
+      Integer'Yield(I);
+   end loop;
+and
+   for I in 1 .. 1_000_000 loop
+      Float'Yield(Sqrt(I));
+   end loop;
+and
+   for I in 1 .. 10 loop
+      Put_Line ("The Square Root of" & Integer'Image(Integer'Consume) & " is"
+                & Float'Image(Float'Consume));
+   end loop;
+   goto Done;
+end do
+
+<<Done>>
+
+****************************************************************
+
+From: Brad Moore
+Sent: Wednesday, October 11, 2017  4:01 PM
+
+Here is my homework for this AI. [This is version /04 of the AI - Editor.]
+
+I have applied major rework to the reduction expression construct, since it was
+very sketchy to begin will.
+
+I have also renamed parallel blocks to concurrent blocks, using the do ... and
+.. end do syntax I described in a recent email. If the parallel keyword is not
+present, then the do statement executes concurrently within the same task, which
+is safer and more useful than just applying sequential execution.
+
+This will also provide a good starting point for AI12-0197, coroutines, since
+most of the construct is already in place.
+
+For parallel loops, the biggest change was to move the parallel keyword from in
+the middle of the syntax, to before the "for" keyword.
+
+****************************************************************
+
+From: Randy Brukardt
+Sent: Wednesday, October 11, 2017  4:01 PM
+
+> Here is my homework for this AI.
+
+I fixed some minor issues in this AI (without making a separate version):
+
+The reference to AI12-0064-1 is stale (that alternative is abandoned), and the
+aspect is named Nonblocking.
+
+The paragraph starting "Any transfer of control..." had two left-over uses of
+"parallel sequences", which I replaced by "concurrent sequences" to match the
+other changes (including earlier in the same sentence).
+
+The paragraph starting "Note that the same rules..." talks about "parallel
+blocks", which was otherwise changed to "concurrent blocks".
+
+There was a missing ) in the first paragraph of the Legality Rules for 4.5.9.
+
+There was a stray "Modify 5.5.2", which was removed.
+
+Some of the formatting was unusually narrow (while it was unusually wide last
+time - can't win, I guess).
+
+
+===
+
+Comment: Examples in the RM need to be complete, typically by depending on
+declarations from previous examples. None of the examples in 4.5.9 or 5.6.1 seem
+to do that. (A couple might be stand-alone, which is OK, but I didn't check
+carefully.) That needs to be fixed.
+
+Comment: You added "reducer" aspects as needed to Ada.Strings and the like. But
+don't the containers need something similar? I could do that in AI12-0112-1, but
+I'd have to know what is needed (and I'm not the right person to figure that
+out).
 
 ****************************************************************

Questions? Ask the ACAA Technical Agent