Version 1.5 of ai12s/ai12-0119-1.txt

Unformatted version of ai12s/ai12-0119-1.txt version 1.5
Other versions for file ai12s/ai12-0119-1.txt

!standard 5.5.2 (2/3)          17-06-10 AI12-0119-1/03
!class Amendment 14-06-20
!status work item 14-06-20
!status received 14-06-17
!priority Medium
!difficulty Hard
!subject Parallel operations
!summary
New syntax and semantics to facilitate parallelism via Parallel Loops, Parallel Blocks, and Parallel Reduction Expressions.
!problem
The increased presence of parallel computing platforms brings concerns to the general purpose domain that were previously prevalent only in the specific niche of high-performance computing. As parallel programming technologies become more prevalent in the form of new emerging programming languages and extensions of existing languages, safety concerns need to consider the paradigm shift from sequential to parallel behavior.
Ada needs mechanisms whereby the compiler is given the necessary semantic information to enable the implicit and explicit parallelization of code. After all, the current Ada language only allows parallel execution when it is semantically neutral (see 1.1.4(18) and 11.6(3/3)) or explicitly in the code as a task.
!proposal
This proposal depends on the facilities for aspect Global (AI12-0079-1) and for aspect Potentially_Blocking (AI12-0064-1). Those proposals allow the compiler to statically determine where parallelism may be introduced without introducing data races.
This proposal informally introduces the semantic notion of a Parallel OPportunity (POP) and a tasklet to the language. POPs are places in program text where work can be spawned to parallel executing workers that work in concert to correctly execute the algorithm. Tasklets are the notational (logical) units within a task that are executed in parallel with each other.
The goals of this proposal are: - To permit existing programs to benefit from parallelism through
minimal restructuring of sequential programs into ones that permit more parallel execution;
- To support the development of complete programs that maximize
parallelism;
- To efficiently guide the compiler in the creation of effective
parallelism (without oversubscription) with minimal input from the programmer;
- To support the parallel reduction of results calculated by parallel
computation;
- To avoid syntax that would make a POP erroneous or produce noticeably
different results if executed sequentially vs in parallel.
An important part of this model is that if the compiler is not able to verify that the parallel computations are independent, then a warning will be issued at compile time (See AI12-0079-1 and AI12-0064-1 for ways on how this can happen).
Note that in this model the compiler will identify any code where a potential data race occurs (following the rules for concurrent access to objects as specified in the Language Reference Manual RM 9.10(23), and point out where objects cannot be guaranteed to be independently addressable. If not determinable at compile-time, the compiler may insert run-time checks to detect data overlap.
Another issue is the underlying runtime. In this model, tasklets are orthogonal to tasks. Regardless of implementation, tasklets are considered to execute in the semantic context of the task where they have been spawned, which means that any operation that identifies a task, such as those in Task_Identification, will identify the task in which the tasklet is spawned. On the other hand, calls by different tasklets of the same task into the same protected object are treated as different calls resulting in distinct protected actions; therefore synchronization between tasklets can be performed using non-blocking protected operations. Note that this is consistent with the current standard which already supports multiple concurrent calls by a single task in the presence of the asynchronous transfer of control capability (RM 9.7.4 (23)).
There are three areas of this proposal; one that introduce capabilities for parallel blocks, another that introduces capabilities for parallel loops, and another that introduces capabilities for parallel reduction.
parallel Blocks ---------------
parallel blocks may be used to specify that two or more parts of an algorithm may be executed in parallel with each other.
Semantics: A parallel block statement encloses two or more sequences of statements (two or more "parallel sequences") separated by the reserved word "and". Each parallel sequence represents a separate tasklet, but all within a single Ada task. task identity remains that of the enclosing Ada task, and a single set of task attributes is shared between the tasklets.
With respect to the rules for shared variables (see RM 9.10(13)), two actions occurring within two different parallel sequences of the same parallel block are not automatically sequential, so execution can be erroneous if one such action assigns to an object, and the other reads or updates the same object or a neighboring object that is not independently addressable from the first object. The appropriate use of atomic, protected, or task objects (which as a group we will call synchronized objects) can be used to avoid erroneous execution.
In addition, the new Global and Potentially_Blocking aspects may be specified to enable the static detection of such problems at compile time (see AI12-0079-1 and AI12-0064-1).
Any transfer of control out of one parallel sequence will initiate the aborting of the other parallel sequences not yet completed. Once all other parallel sequences complete normally or abort, the transfer of control takes place. If multiple parallel sequences attempt a transfer of control before completing, one is chosen arbitrarily and the others are aborted.
If an exception is raised by any of the parallel sequences, it is treated similarly to a transfer of control, with the exception being propagated only after all the other sequences complete normally or due to abortion. If multiple parallel sequences raise an exception before completing, one is chosen arbitrarily and the others are aborted. The parallel block completes when all of the parallel sequences complete, either normally or by being aborted.
Note that aborting a tasklet need not be preemptive, but should prevent the initiation of further nested parallel blocks or parallel loops.
parallel Loops --------------
A common approach to parallelizing loops is to break the loop iterations into chunks where each chunk consists of a subset of the entire iteration set, and may be as small as a single iteration, up to as large as the entire number of iterations, or somewhere in-between. In Ada, each chunk becomes a separate tasklet.
The implementation is expected to determine appropriate values for the number of chunks and the size of each chunk for each parallel loop, without these having to be specifed by the programmer.
To indicate that a loop is a candidate for parallelization, the reserved word "parallel" may be inserted immediately after the word "in" or "of" in a "for" loop, at the point after where the "reverse" reserved word is allowed. Such a loop will be broken into chunks, where each chunk is processed sequentially.
Note that the same rules presented for parallel blocks above apply to the update of shared variables and the transfer of control to a point outside of the loop, and for this purpose each iteration (or chunk) is treated as equivalent to a separate sequence of a parallel block.
Reduction Expressions ---------------------
A Reduction Expression is a syntactic construct, similar to a quantified expression that can be used to combine a set of values into a single result. This mechanism is called reduction.
Indeed, a quantified expression can be viewed as being a special purpose Reduction Expression that applies an operation, the Predicate, to a set of values to reduce the predicate results to a single Boolean result. A Reduction expression is a more general syntactic form where there is less constraint on the type of operation to be applied to the set of values, and the operation can produce result types other than Boolean values.
Semantics: A Reduction Expression looks like a quantified expression, except that the quantifier keyword ("all" or "some") is not present, and the predicate expression is replaced with a function call that returns a constrained, nonlimited type, that is called the combiner_function_call. The Reduction expression must involve a function call having at least one parameter of the same type as the type of the Reduction expression. The combiner function is called iteratively to accumulating a single result, which becomes the result of the Reduction Expression. The accumulation of the result occurs as the result of each iteration is implicitly fed back as an input parameter to a function for each subsequent call. The implicit parameter is identified syntactically in the call based on the "<>" box notation syntax, except that the box notation encloses the initial value to be used for the first call to the function, since at that point there would otherwise be no existing value to pass into the function. The initial value also serves to be the result of the reduction expression for the case when there are no iterations performed (E.g. for the case when a when a loop_parameter_specification for a Reduction Expression specifies a null range).
To indicate that a Reduction Expression is a candidate for parallelization, the reserved word "parallel" may be inserted immediately after where the "reverse" reserved word is allowed. Similar to the way Parallel Loops are broken into chunks, a Parallel Reduction expression will also have chunking applied, where each chunk is processed sequentially. For each chunk, an accumulator result is generated that is local to the tasklet, and for the first call to the subprogram for each chunk the initial value is used to initialize the local accumulated result. As the chunk results are calculated in parallel, the result of the Reduction Expression is generated by combining/reducing the final function results of each chunk by applying a Reducer function for the Reduction Expression. The Reducer function for a parallel Reduction expression is a function that accepts two parameters where both parameters are of the same type, and returns a result of that same type, which is also the type of the Reduction expression. The Combiner function must have a Reducer aspect specified for its declaration if the function itself is not a Reducer function. The Reducer aspect identifies another function that is to be used to combine multiple chunk results into the final result of the Reduction expression. The multiple results are combined two at a time.
The initial value specified by the programmer must match (be confirming to) the value of the Identity aspect associated with the Reducer function for the Reduction Expression, if the Reducer function has the aspect. The Identity aspect of a function identifies a value of the same type as the function result. If the Identity aspect is not associated with the Reducer function, then the Initial value specified by the programmer is assumed to be correct.
The Reducer function is expected to generate an associative result based on the input parameters. A Reducer function does not need to be commutative (e.g. vector concatenation), as it is expected that the implementation will ensure that the results are combined in a manner that is consistent with sequentially applying the Reducer function to the chunk results in iteration order. If the parallel keyword is not present in the Reduction Expression, then sequential computation is assumed and the Reducer aspect does not need to be specified for the combiner function.
For parallel Reductions Expressions, it is important that the value of the Identity aspect associated with a function does not affect the result of the computation. For example, if the reduction result type is Integer and the reducer is addition, then the Identity value should be zero, since adding zero to any value does not affect the value. Similarly, if the reducer is multiplication then the Identity value should be one, since multiplying any Integer value by 1 does not affect the value. If the result type is a vector and the reducer is concatenation, then the Identity value should be an empty vector, since concatenating an empty vector to another vector does not affect the value of the other vector.
Note that the same rules presented for parallel blocks above apply to the update of shared variables, and for this purpose each iteration (or chunk) is treated as equivalent to a separate sequence of a parallel block.
!wording
Append to Introduction (28) "A parallel block statement requests that two or more sequences of
statements should execute in parallel with each other."
Modify 4.4(1/3)
"In this International Standard, the term "expression" refers to a construct of the syntactic category expression or of any of the following categories: choice_expression, choice_relation, relation, simple_expression, term, factor, primary, conditional_expression, quantified_expression {, reduction_expression}."
Modify 4.4(7/3)
"primary ::=
numeric_literal | null | string_literal | aggregate {| <reduction_identity>} | name | allocator | (expression) | (conditional_expression) | (quantified_expression) {| (reduction_expression)}"
Add 4.4(15.h)
"Wording Changes from Ada 2012
Added Reduction_expression to primary."
Replace 4.5.1(2-3) with
"The following logical operators are predefined for every boolean type T
function "and"(Left, Right : T) return T with Identity => True function "or" (Left, Right : T) return T with Identity => False function "xor"(Left, Right : T) return T with Identity => False"
The following logical operators are predefined for every modular type T
function "and"(Left, Right : T) return T function "or" (Left, Right : T) return T with Identity => 0 function "xor"(Left, Right : T) return T with Identity => 0"
The following logical operators are predefined for every one-dimensional array type T whose component type is a boolean type:
function "and"(Left, Right : T) return T function "or" (Left, Right : T) return T function "xor"(Left, Right : T) return T"
Add AARM note
"The Identity aspect for the "and" function for modular types is not specified
because it is difficult to specify a value that would work for all types. eg. type X is mod 97; We could say that it is T'Last for types where the modulus is a power of 2, for example, but that is not general enough, similarly, we could say that the Identity for the "and" function for a one-dimensional array is T'(others => True), but that only works if T is a constrained array type, so we dont bother trying to specify the Identity aspect for these operations."
Modify 4.5.3(2)
"function "+"(Left, Right : T) return T{ with Identity => T(0)}
function "-"(Left, Right : T) return T{ with Reducer => "+"}
"
Modify 4.5.5(2)
"function "*" (Left, Right : T) return T{ with Identity => 1}
function "/" (Left, Right : T) return T{ with Reducer => "*"} function "mod"(Left, Right : T) return T function "rem"(Left, Right : T) return T"
Modify 4.5.5(12) "function "*"(Left, Right : T) return T{ with Identity => 1.0}
function "/"(Left, Right : T) return T{ with Reducer => "*"}
"
Modify 4.5.5(14) "function ""(Left : T; Right : Integer) return T{ with Reducer => ""}
function ""(Left : Integer; Right : T) return T{ with Reducer => ""} function "/"(Left : T; Right : Integer) return T{ with Reducer => "*"}
"
Modify 4.5.5(19) "function "*"(Left, Right : universal_fixed) return universal_fixed{ with Identity => 1.0}
function "/"(Left, Right : universal_fixed) return universal_fixed{ with Reducer => "*"
"
Add Section 4.5.9
"Reduction Expressions"
Reduction expressions provide a way to write a reduction that combines a set of values into a single result. Syntax
reduction_identity ::= primary
reduction_expression ::= for loop_parameter_specification => combiner_function_call | for iterator_specification => combiner_function_call
combiner_function_call ::= function_call
Wherever the Syntax Rules allow an expression, a reduction_expression may be used in place of the expression, so long as it is immediately surrounded by parentheses.
Discussion: The syntactic category reduction_expression appears only as a primary that is parenthesized. The above rule allows it to additionally be used in other contexts where it would be directly surrounded by parentheses. This is the same rule that is used for conditional_expressions; see 4.5.7 for a detailed discussion of the meaning and effects of this rule.
Name Resolution Rules
The expected type of a reduction_expression is any constrained nonlimited type. The combiner_function_call in a reduction_expression is expected to return a result of the same type. The type of a reduction_identity included in the combiner_function_call is expected to be of the same type as the reduction_expression.
Legality Rules
The combiner_function_call of a reduction_expression shall include a function call that has an explicit_actual_parameter that is a reduction_identity.
The combiner_function_call of a reduction_expression shall have a Reducer_Aspect if the reduction_expression includes the keyword parallel.
The value of a reduction_identity associated with a combiner_function_call shall be confirming with the Identity aspect of the function designated by the Reducer_Aspect of the combiner_function_call if the designated function has the Identity aspect.
Dynamic Semantics
For the evaluation of a reduction_expression, the loop_parameter_specification or iterator_specification is first elaborated. The evaluation of a reduction_expression then evaluates the combiner_function_call for the values of the loop parameter in the order specified by the loop_parameter_specification (see 5.5) or iterator_specification (see 5.5.2).
The value of the reduction_expression is determined as follows: For the first iteration, the value of the reduction_identity parameter is the value of the primary of the reduction_identity. For subsequent iterations, the value of the reduction_identity parameter is that of the result of the combiner_function_call for the previous iteration. If the Reduction_Expression does not evaluate any iterations, then the value of the Reduction_Expression is the value of the primary of the reduction_identity.
Examples
A reduction expression to calculate the sum of elements of an array
(for parallel Element of Arr => <0> + Element)
A reduction expression to calculate the minimum value of an array
(for parallel X of Arr => Integer'Min(<Integer'Last>, X))
A reduction expression to create an unbounded string containing the alphabet
(for Letter in 'A' .. 'Z' => <Null_Unbounded_String> & Letter)
A reduction expression to determine how many people in a database are 30 something
ThirtySomething : contant Natural := (for P of parallel Personel => <0> + (if Age(P) > 30 then 1 else 0));
An expression function that returns its result as a Reduction Expression
function Fact(N : Natural) return Natural is (for J in 1..N => <1> * J);
An expression function that computes the Sin of X using Taylor expansion
function Sin(X : Float; Num_Terms : Positive := 5) return Float is (for I in 1..Num_Terms => <0.0> + (-1.0)**I * X**(2*I-1)/Float(Fact(2*I-1)));
A reduction expression that computes taxable income
Taxable_Income : constant Float := Total_Income +
(for Deduction of Deductions => <0.0> - Deduction);
A reduction expression that outputs the sum of squares
Put_Line ("Sum of Squares is" & Integer'Image(for I in 1 .. 10 => <0> + I**2));
A reduction expression to compute the value of Pi in parallel
Number_Of_Steps : constant := 100_000; Step : constant := 1.0 / Number_Of_Steps;
Pi : constant Long_Float := Step *
(for I in parallel 1 .. Number_Of_Steps =>
<0.0> + (4.0 / (1.0 + ((Long_Float (I) - 0.5) * Step)**2)));
Wording Changes from Ada 2012 Reduction expressions are new.
Modify 5.5(4) Loop_parameter_specification ::=
defining_identifier in [reverse {[parallel]}] discrete_subtype_definition
(Static Semantics)
A reduction_list specifies that each declaration denoted by each name given
in the list has the Reducer aspect (see 9.12.xx)
Modify to 5.5(9/4)
"For the execution of a loop_statement with the iteration_scheme being for
loop_parameter_specification, the loop_parameter_specification is first elaborated. This elaboration creates the loop parameter{s} and elaborates the discrete_subtype_definition. {Multiple loop parameters may be created where each loop parameter is associated with a non-overlapping range of the iterations, if the keyword parallel is present, otherwise a single loop parameter is assumed. Each loop parameter can execute concurrently with other loop parameters of the same loop. Each thread of control proceeds independently and concurrently between the points where they interact with other tasks and with each other.} If the discrete_subtype_definition defines a subtype with a null range, the execution of the loop_statement is complete. Otherwise, the sequence_of_statements is executed once for each value of the discrete subtype defined by the discrete_subtype_definition that satisfies the predicates of the subtype (or until the loop is left as a consequence of a transfer of control). Prior to each such iteration, the corresponding value of the discrete subtype is assigned to the loop parameter. These values are assigned in increasing order unless the reserved word reverse is present, in which case the values are assigned in decreasing order.
"
AARM - An implementation should statically treat the sequence_of_statements as being executed by separate threads of control, but whether they actually execute in parallel or sequentially should be a determination that is made dynamically at run time, dependent on factors such as the available computing resources.
Examples after 5.5(20)
Example of a parallel loop without reduction
for I in parallel Buffer'range loop Buffer(I) := Arr1(I) + Arr2(I); end loop;
"5.6.1 Parallel Block Statements
[A parallel_block_statement encloses two or more sequence_of_statements where all the sequence_of_statements can execute in parallel with each other.]
Syntax
parallel_block_statement ::= parallel sequence_of_statements and sequence_of_statements {and sequence_of_statements} end parallel;
Static Semantics
Each sequence_of_statements represents a separate thread of control that proceeds independently and concurrently between the points where they interact with other tasks and with each other.
AARM - An implementation should statically treat each sequence_of_statements as a separate thread of control, but whether they actually execute in parallel or sequentially should be a determination that is made dynamically at run time, dependent on factors such as the available computing resources.
Examples
Example of a parallel block statement:
parallel Foo(Z); and Bar(X, Y); and Put_Line ("Executing Foo and Bar in parallel with Other_Work"); Other_Work; end parallel;
Add C.7.1 (5.1)
The Task_Id value associated with each sequence_of_statements of a parallel_block_statement or of a loop statement is the same as that of the enclosing statement.
AARM - Each sequence_of_statements of a parallel block or parallel loop are treated as though they are all executing as the task that encountered the parallel block or parallel loop statement.
Change 9.10 (13)
"Both actions occur as part of the execution of the same task {unless they are each part of a different sequence_of_statements of a parallel block statement, parallel loop statement, or parallel reduction expression.}"
New section 9.12 Executors and Tasklets
A task may distribute execution across different physical processors in parallel, where each execution is a separate thread of control that proceeds independently and concurrently between the points where they interact with other tasks and with each other. Each separate thread of control of a task is an Executor, and the execution that each executor performs between synchronization is a tasklet. When a task distributes its execution to a set of executors, it cannot proceed with its own execution until all the executors have completed their respective executions.
A parallel block statement, parallel loop statement, or parallel reduction expression may assign a set of executors to execute the construct, if extra computing resources are available.
For a function declaration, the following language-defined representation aspects may be specified:
Reducer The aspect Reducer denotes a function with the following specification:
function reducer(L, R : in result_type) return result_type
where result_type statically matches the subtype of the type declaration or subtype declaration.
Only one Reducer aspect may be applied to a single function declaration;
Identity The aspect Identity denotes a value that is to specified as the Identity_Value in a Reduction_Expression if the Combining_Function_Call of the Reduction_Expression either has the Identity aspect, or has a Reducer aspect that designates a function that has the Identity aspect. An Identity aspect can only be specified for a function declaration that has the following form;
function reducer(L, R : in result_type) return result_type
where result_type statically matches the subtype of the type declaration or subtype declaration.
Only one Identity aspect may be applied to a single function declaration;
Modify A.4.4 (13-17)
function Append (Left, Right : in Bounded_String; Drop : in Truncation := Error) return Bounded_String{ with Reducer => "&"};
function Append (Left : in Bounded_String; Right : in String; Drop : in Truncation := Error) return Bounded_String{ with Reducer => "&"};
function Append (Left : in String; Right : in Bounded_String; Drop : in Truncation := Error) return Bounded_String{ with Reducer => "&"};
function Append (Left : in Bounded_String; Right : in Character; Drop : in Truncation := Error) return Bounded_String{ with Reducer => "&"};
function Append (Left : in Character; Right : in Bounded_String; Drop : in Truncation := Error) return Bounded_String{ with Reducer => "&"};
Modify A.4.4 (21-26)
function "&" (Left, Right : in Bounded_String)
return Bounded_String{ with Identity => Null_Bounded_String};
function "&" (Left : in Bounded_String; Right : in String)
return Bounded_String{ with Reducer => "&"};
function "&" (Left : in String; Right : in Bounded_String)
return Bounded_String{ with Reducer => "&"};
function "&" (Left : in Bounded_String; Right : in Character)
return Bounded_String{ with Reducer => "&"};
function "&" (Left : in Character; Right : in Bounded_String)
return Bounded_String{ with Reducer => "&"};
Modify A.4.5 (15-19)
function "&" (Left, Right : in Unbounded_String)
return Unbounded_String{ with Identity => Null_Unbounded_String};
function "&" (Left : in Unbounded_String; Right : in String)
return Unbounded_String{ with Reducer => "&"};
function "&" (Left : in String; Right : in Unbounded_String)
return Unbounded_String{ with Reducer => "&"};
function "&" (Left : in Unbounded_String; Right : in Character)
return Unbounded_String{ with Reducer => "&"};
function "&" (Left : in Character; Right : in Unbounded_String)
return Unbounded_String{ with Reducer => "&"};
Modify A.18.2 (15/2-18/.2)
function "&" (Left, Right : Vector) return Vector
{ with Identity => Empty_Vector};
function "&" (Left : Vector;
Right : Element_Type) return Vector
{ with Reducer => "&"};
function "&" (Left : Element_Type;
Right : Vector) return Vector
{ with Reducer => "&"};
function "&" (Left, Right : Element_Type) return Vector
{ with Reducer => "&"};
Add after A.18.3 (32/2)
function "&" (Left, Right : List) return List
{ with Identity => Empty_List};
function "&" (Left : List;
Right : Element_Type) return List
{ with Reducer => "&"};
function "&" (Left : Element_Type;
Right : List) return List
{ with Reducer => "&"};
function "&" (Left, Right : Element_Type) return List
{ with Reducer => "&"};
Modify A.18.7(55/2)
function Union (Left, Right : Set) return Set { with Identity => Empty_Set};
Modify A.18.7(67/2)
function Symmetric_Difference (Left, Right : Set) return Set {with Identity => Empty_Set};
Modify A.18.8(27/2)
function Union (Left, Right : Set) return Set { with Identity => Empty_Set};
Modify A.18.8(35/2)
function Symmetric_Difference (Left, Right : Set) return Set { with Identity => Empty_Set};
Modify A.18.9(28/2)
function Union (Left, Right : Set) return Set { with Identity => Empty_Set};
Modify A.18.9(37/2)
function Symmetric_Difference (Left, Right : Set) return Set { with Identity => Empty_Set};
!discussion
There is a continuing trend of exponential growth of computational elements embedded on a single chip. This has led to significant challenges for software designers and implementers. Prior to 2005, the increase in transistor count corresponded to a similar increase in CPU clock speed, which boosted performance of sequential algorithms. Since then, CPU clock speeds have leveled off for reasons of practicality, and chip manufacturers have instead moved towards multicore technologies as a means of achieving performance increases.
As parallel technologies become more prevalent in the form of new emerging hardware platforms, safety concerns need to consider the paradigm shift from sequential to parallel behaviour. It is a paradigm that has been for a long time contained within a specialized domain of high-performance computing, but that is now required in all domains of computing systems.
It is important for the programmer to indicate places in the code where parallelism is desired, which allows the reader to better understand the algorithm, and allows the programmer to receive confirmation from the compiler whether the desired parallelism is safe, or even worthwhile.
This proposed model does not define syntax for the explicit parallelization of individual subprogram calls, since such parallelization can be performed implicitly by the compiler, when it knows that the calls are free of side-effects. This is facilitated by annotations identifying global variable usage on subprogram specifications (see AI12-0079-1).
parallel Blocks ---------------
example: declare X, Y : Integer; Z : Float; begin
parallel X := Foo(100); and Z := Sqrt(3.14) / 2.0; Y := Bar(Z); end parallel;
Put_Line("X + Y=" & Integer'Image(X + Y)); end;
In this example, the calculation of Z and Y occur sequentially with respect to each other, but in parallel with the calculation of X. Note that the compiler, using the rules specified in AI12-0079-1, may complain if the parallel sequences might have conflicting global side-effects.
The parallel block construct is flexible enough to support recursive usage as well, such as:
function Fibonacci (N : Natural) return Natural is X, Y : Natural; begin if N < 2 then return N; end if;
parallel X := Fibonacci (N - 2); and Y := Fibonacci (N - 1); end parallel;
return X + Y; exception when others => Log ("Unexpected Error"); end Fibonacci;
We considered allowing the parallel block to be preceded with an optional declare part, and followed with optional exception handlers, but it was observed that it was more likely to be useful to have objects that are shared across multiple parallel sequences to outlive the parallel block, and that having exception handlers after the last parallel sequence could easily be misconstrued as applying only to the last sequence. Therefore we reverted to the simpler syntax proposed above. This simpler syntax is also more congruous with the syntax for select statements. Because there are no local declarations, there was also no point in having a statement_identifier (block label) for a parallel block. This is actually not entirely true. Allowing an exit statement to replace a goto that targets the location following the end of the select statement seems useful. To allow such a statement, a block label might be needed to identify the parallel block that is being exited. It was felt that the need for allowing an exit statement in a parallel block could be the subject of a separate AI.
parallel Loops --------------
In most compute-intensive applications, a significant proportion of the computation time is spent in loops, either iterating over arrays/container data structures, or systematically searching a large solution space. To benefit from parallel hardware, the computation associated with a loop should be spread across the available processors.
One approach, presuming the iterations of the loop have no data dependences between them, is to treat each iteration of the loop as a separate tasklet, and then have the processors work away on the set of tasklets in parallel. However, this introduces overhead from the queuing and de-queuing of work items, and the communication of results from each work item. Furthermore, there often are data dependences between iterations, and creating a separate work item for each iteration can introduce excessive synchronization overhead to deal safely with these interdependences. Therefore, it is common to break large arrays, and/or the loops that iterate over them, into chunks (or slices or tiles), where each chunk is processed sequentially, but multiple chunks can be processed in parallel with one another.
Note: Allowing the parallel keyword on while loops was considered, but was discarded since while loops cannot be easily parallelized, because the control variables are inevitably global to the loop.
For example, here is a simple use of a parallelized loop, to add two arrays together to produce a result array:
for I in parallel Loop_Iterations'range loop Result (I) := A(I) + B(I); end loop;
Note that the compiler, using the rules specified in AI12-0079-1, may complain if the parallel sequences might have conflicting global side-effects. We considered extending the syntax to allow reductions to be performed with parallel loops, but we discovered that parallel reduction expressions can provide the same capability more concisely, so for now we allow parallel loops to work if there are no global conflicts. A separate AI could be created to add reduction loop capability if it is decided that this is needed. In the meantime, Reduction Expressions appear to better fill this need. The main purpose of a loop is to apply processing iteratively. The main purpose of a reduction however, is to generate a result value. Iteration is a means to achieve that end, but a reduction expression seems better suited since it is designed to produce a result value.
Reduction Expressions ---------------------
A reduction expression provides a concise way to use iteration to combine a set of values into a single result.
We already have some special purpose reduction expressions in Ada in the form of quantified expressions.
Consider:
All_Graduated : constant Boolean := (for all Student of parallel Class => Passed_Exam (Student));
Someone_Graduated : constant Boolean := (for some Student of parallel Class => Passed_Exam (Student));
Note that the keyword parallel would now be allowed in quantified expressions since for loops and quantified expressions have common syntax for loop parameter specifications.
Here we are in both cases effectively reducing a set of Boolean values into a single result using quantified expressions.
A similar effect can be written using the more generalized Reduction expression syntax.
All_Graduated : constant Boolean := (for Student of parallel Class => <True> and Passed_Exam (Student));
Someone_Graduated : constant Boolean := (for Student of parallel Class => <False> or Passed_Exam (Student));
Here we use "and" and "or" as the combining_function_call of the Reduction expression. The initial value plays an important role, since the Identity for "and" must be true, and false for "or".
Another concern might be whether parallel loops can generally be written as expressions.
Consider the calculation of Pi:
Number_Of_Steps : constant := 100_000; Step : constant := 1.0 / Number_Of_Steps;
Pi : constant Long_Float := Step * (for I in parallel 1 .. Number_Of_Steps => <0.0> + (4.0 / (1.0 + ((Long_Float (I) - 0.5) * Step)**2)));
One might feel that the readability of this example might be improved if temporary variable declarations were to be added for intermediate results. For instanced, it might be nice to replace the sub-expression;
(1 + ((Long_Float (I) - 0.5) * Step)**2)
with
(1.0 + X**2)
One ordinarily cannot declare variables in an expression, but one can always write a function which can be called from an expression.
We might instead decide to write:
function Pi_Slicer (R : Long_Float; Index : Natural) return Float with Reducer => "+" is X : constant Long_Float := (Long_Float (Index) - 0.5) * Step; begin return R + 4.0 / (1.0 + X ** 2); end Pi_Slicer;
Pi : constant Long_Float := Step * (for I in parallel 1 .. Number_Of_Steps => Pi_Slicer (<0.0>, I));
Which leaves a simpler looking Reduction Expression to do computation.
Another concern might be about the need to perform multiple reductions at the same time.
Consider:
Sum : Integer := (for parallel Element of Arr => <0> + Element); Min : Integer := (for parallel X of Arr => Integer'Min(<Integer'Last>, X)); Max : Integer := (for parallel X of Arr => Integer'Max(<Integer'First>, X));
Here we have three calculations that occur in parallel, but sequentially with respect to each other. The performance benefits of parallelism should be noticeable for larger arrays, but one might want to calculate these three results iterating only once through the array.
This can be accomplished by creating a composite result type, and writing a user defined reducer function.
type Summary is record Sum : Integer; Min : Integer; Max : Integer; end record;
-- Identity value for the Reduce function Identity : constant Summary := Summary'(Sum => 0, Min => Integer'Last, Max => Integer'First);
-- Reducer function for the Reduction expression function Reduce (L, R : Summary) return Summary with Identity => Identity is (Summary'(Sum => L + R, Min => Integer'Min (L, R), Max => Integer'Max (L, R)));
-- Combiner function for the Reduction expression function Process (L : Summary; X : Integer) return Summary with Reducer => Reduce is (Summary'(Sum => L.Sum + X, Min => Integer'Min (L.Min, X), Max => Integer'Max (L.Max, X)));
-- Reduction expression to compute all 3 results at once Result : constant Summary := (for parallel X of Arr => Process (<Identity>, X));
!ASIS
** TBD.
!ACATS test
ACATS B-Test and C-Test are needed for these features.
!appendix

From: Tucker Taft
Sent: Tuesday, June 17, 2014  10:18 AM

Here is the paper the "gang of 4" is submitting to HILT 2014 with a relatively
detailed proposal for parallel programming in Ada 202x:

     http://bit.ly/parallada-gangof4

If there is time at the upcoming ARG meeting, some discussion and feedback from
ARG members would be appreciated.

****************************************************************

From: Randy Brukardt
Sent: Tuesday, June 17, 2014  12:52 PM

Are we going to get formal proposals for some/all of these features? For
instance, we have an open AI on global in/out (AI12-0079-1) which is certainly
more broadly applicable than just parallelism. There is also an AI for
Potentially_Blocking, AI12-0026-1, misnamed "Task_Safe", which is useful for any
program including tasking. In both of these cases, an important part of the
proposals will be how they interact with predefined and language-defined
subprograms. (If they're not specified for predefined "+", nothing useful could
be done, for example.)

As with all of these sorts of things, the devil is in the details. The basic
ideas look fine, but we'll never really know without detailed proposals that can
be tested against.

****************************************************************

From: Tucker Taft
Sent: Tuesday, June 17, 2014   1:54 PM

True, but it is not worth making a more detailed proposal if the overall
reaction is "yuck"!  ;-)

****************************************************************

From: Randy Brukardt
Sent: Tuesday, June 17, 2014   3:10 PM

Well, it looks great to me, but that's mainly because all of the details are
missing. A lot of stuff looks great without the details (it's how modeling
works, after all). The "yuck" only comes when the details are fleshed out...

****************************************************************

From: Bob Duff
Sent: Tuesday, June 17, 2014   5:52 PM

Sounds like there are some good ideas in the paper, but I am very strongly
opposed to ARG getting "detailed proposals" about how to do parallelism, or even
wasting time discussing the matter.  ARG should be moving more to the
"standardize existing practise" way of doing.  And that goes double for features
whose ONLY purpose is efficiency -- such as parallelism.

We (ARG) don't need detailed proposals in that area; we need an experimental
implementation, and MEASUREMENTS of the efficiency of realistic programs.  (And
don't try to show me speedups of the recursive Fibonacci -- that's downright
dishonest!)  Until then, ARG should avoid wasting any time on the matter.  Once
we have proven practise, then ARG should take charge of standardizing it.

My comments above don't apply so much to things like Globals, which are more
about correctness/safety than efficiency.  ARG could reasonably discuss Globals,
although even there, an experimental implementation would be helpful.

Regarding those: You can put Globals and Potentially_Blocking on a package.
That's good.  I think it should apply to child and nested packages of that
package. I think you also want a way to give a default for Overlaps_Storage: "no
overlaps unless specified, in this package and children".

****************************************************************

From: Robert Dewar
Sent: Wednesday, June 18, 2014  2:52 AM

> Sounds like there are some good ideas in the paper, but I am very
> strongly opposed to ARG getting "detailed proposals" about how to do
> parallelism, or even wasting time discussing the matter.  ARG should
> be moving more to the "standardize existing practise" way of doing.
> And that goes double for features whose ONLY purpose is efficiency --
> such as parallelism.

I am in full agreement with this.

At the last RTW, which I attended, I was struck by the fact that the RTW was
debating removing some features in Annex D which no one has ever implemented,
yet alone found useful in actual practice.

The RM is not a research proposal :-)

****************************************************************

From: Alan Burns
Sent: Wednesday, June 18, 2014  5:59 AM

The policy currently in Ada has been implemented by the Ada run-time from
Santandar, and is supported in EDF operating systems (not Ada), though I agree
these are mainly research and experimental platforms, not mainstream industrial
platforms.

****************************************************************

From: Robert Dewar
Sent: Wednesday, June 18, 2014  10:39 AM

But I am talking about the specific Ada features, which for sure have not been
used by anyone in an Ada program :-)

****************************************************************

From: Tucker Taft
Sent: Wednesday, June 18, 2014  8:38 AM

> The RM is not a research proposal :-)

I agree that many language standards come after the fact, and are used to try to
tame a wildly varying set of ad hoc extensions.  That has never been the story
with Ada.  There are relatively few extensions (other than as packages, pragmas,
or attributes) that Ada vendors or users try in any serious way, before the Ada
Rapporteur Group begins to talk seriously about them.  The ideal is that the ARG
gets the ball rolling, indicates a direction we are serious about, and then
vendors try prototyping the ideas, and then we come around to picking among the
"serious" ideas based on prototyping results.  I really don't see anyone beside
the ARG being taken seriously at this point as a fountain for "ideas worth
prototyping" (apologies to TED ;-).

****************************************************************

From: Stephen Michell
Sent: Sunday, June 22, 2014  10:24 PM

Attached is a report that ARG requested from Brad, Tucker  and myself in
investigating how OpenMP could be used to implement Ada parallelism. For
discussion at the ARG meeting this week.

-----

				Report to WG 9/ARG
 		  on the use of OpenMP for Parallelism in Ada

Stephen Michell
with contributions from
Luis Miguel Pinho
Brad Moore
Tucker Taft


Foreward

At the ARG meeting November 2014, ideas were presented about ways to
implement parallelism in Ada, and ways to interoperate with other
language systems while using the multicore capabilities of modern
computers. We were requested to investigate OpenMP from a technical
perspective as well as a business perspective as a potential solution
to implementing multicore parallelism in Ada.


OpenMP Characteristics

OpenMP is a set of libraries and language extensions that permit a
user to access the capabilities of multicore systems from within
certain languages - namely Fortran, C and C++.

OpenMP's parallelism is "explicit parallelism" - i.e. all aspects of
the parallelism are specified by compiler directives in the code that
invoke library routines at the place of the directive through OpenMP API's.

OpenMP lets the user specify code segments (usually "for loops",
blocks and functions) that can be executed by threads (possibly
operating on multiple cores) that are co-ordinated to implement an
algorithm. The language extensions are converted into library calls to
interact with a runtime to implement the algorithm.

In trying to implement OpenMP on an Ada system, one would note that
the concurrency model is quite different than Ada's. The
"threads"  that are created interact with other threads that use OpenMP
primitives. This means that they will not interact with Ada tasks
using Ada tasking primitives.

For the interoperability with Fortran, C and C++ systems that are
using OpenMP, an Ada OpenMP capability would be useful, whetehr or not
it interacted with Ada tasks. This brings us to the business model.

Business Model

To implement An OpenMP for Ada, one must participate in the OpenMP
community. One could develop compiler directives, either as aspects or
pragmas and convert calls to meet the API, but to have it
"standardized", one must be a full participant in the OpenMP group.

One can participate in OpenMP as a university, as an individual or as
a corporate participant. AdaCore was approached to see if they were
interested in joining OpenMP (and paying fees), but declined. There is
no point joining as individuals, as individuals cannot propose new
language specifications for standardization.

This leaves participation through a university. Barcelona
Supercomputing Group is a participant, and Miguel Pinho has approached them about
participating through Barcelona. Barcelona is receptive, but
participation requires that Miguel find a PhD student willing to
participate. To date there has been no final resolution.

There is a GNU OpenMP binding (GOMP) that supports the GNU Fortran, C
and C++ GNU compilers. Ada could define a thin binding to that
interface. This would still leave the challenge of defining Ada syntax
for the Open MP compiler directives and determining how such a library
would interact with the Ada runtime (Tasking is not the only issue -
elaboration issues are also a potential issue).

Conclusion

There are business reasons and technical reasons for creating an
OpenMP Ada implementation. If an interface with OpenMP through
Barcelona Supercomputing Group becomes possible, further exploration should
happen. Otherwise, no further action in this area is anticipated.

****************************************************************

From: Brad Moore
Sent: Monday, October 3, 2016  10:09 AM

[This is version /02 of the AI - Editor.]

Here is my attempt to refine ideas from the gang of 4 working on parallelism,
so we might have something more to talk about on this topic at the upcoming
ARG meeting. There are many ideas to choose from, so I chose what I thought
were the most promising, I'm not sure if all 4 of us would agree, though it is
possible that we might be in agreement, as we haven't yet gotten around to
deciding on the best way forward. Most of the writing related to parallel
blocks is unchanged from the previous version of this AI, and was mostly
written by Tucker. The updates I have are mostly to do with parallel loops
and reduction.

I have attempted to put in wording for these changes, but the wording is not
polished. The wording hopefully does capture and describe the intent at least.

The main hope is to gain some feedback, to see if this direction makes sense,
or if other directions should be pursued.

It would probably also be good to discuss whether this parallel loop proposal
is an improvement over what was written in the previous version of the AI,
which discussed the use of parallel arrays.

****************************************************************

From: Brad Moore
Sent: Saturday, June 10, 2017  1:18 AM

[This is version /03 of the AI - Editor.]

Here is a bit of a redesign in the area of parallel loops, and a new approach,
parallel reduction expressions. The parallel block part is mostly unchanged.

The parallel loop design in much simpler. It now just does parallel iteration,
leaving parallel reductions to reduction expressions.

We can add reduction back to loops later, possibly in a different AI if we
decide we need it, but for the time being, it seems like reduction expressions
are a better fit for the problem.

Parallel reduction expressions are very much similar to quantified expressions,
which is where I started thinking about this idea. It turns out that Tucker was
also thinking along these lines in Parasail, as Parasail apparently already has
this feature.

After some email exchange with the gang of 4, the more this idea evolved, the
more similar it became to the Parasail approach.

The basic idea is that a Reduction expression applies an operation using
iteration, where the previous result of the last iteration is fed back as an
input into the next iteration.

Special syntax, "<value>" is used to indicate where this feedback occurs, which
also encloses the initial value to be used for the first iteration.

Here are a few examples that are in the AI, to give a basic idea.

    -- Sum of the elements of an array
    Sum : Integer := (for parallel Element of Arr => <0> + Element);

    -- Minimum value of an array
    Min : Integer :=
     (for parallel X of Arr => Integer'Min(<Integer'Last>,  X));

    -- Construct the alphabet
    Alphabet : Unbounded_String :=
      (for Letter in 'A' .. 'Z' => <Null_Unbounded_String> & Letter);

    -- Number of people who are 30 or older
    ThirtySomething : contant Natural :=
      (for P of parallel database =>
          <0> + (if Age(P) > 30 then 1 else 0));

    -- Factorial function
    function Fact(N : Natural) return Natural is
      (for J in 1..N => <1> * J);

    -- Taxable_Income : constant Float := Total_Income +
      (for Deduction of Deductions => <0.0> - Deduction);

    --  Compute Taylor expansion for Sin of X
    function Sin(X : Float; Num_Terms : Positive := 5) return Float is
      (for I in 1..Num_Terms =>
         <0.0> + (-1.0)**I * X**(2*I-1)/Float(Fact(2*I-1)));

    -- Compute Pi
    Number_Of_Steps : constant := 100_000;
    Step : constant := 1.0 / Number_Of_Steps;

    Pi : constant Long_Float := Step *
      (for I in parallel 1 .. Number_Of_Steps =>
          <0.0> + (4.0 / (1.0 + ((Long_Float (I) - 0.5) * Step)**2)));

    -- Print the sum of squares
    Put_Line ("Sum of Squares is" &
              Integer'Image(for I in 1 .. 10 => <0> + I**2));

To see the similarity between Quantified expressions and the more general form,
Reduction expressions, consider:

    -- Quantified expressions about the graduating class

    All_Graduated : constant Boolean :=
      (for all Student of parallel Class => Passed_Exam (Student));

    Someone_Graduated : constant Boolean :=
      (for some Student of parallel Class => Passed_Exam (Student));


    -- The same result can be obtained by writing these as
    -- reduction expressions

    All_Graduated : constant Boolean :=
      (for Student of parallel Class =>
         <True> and Passed_Exam (Student));

    Someone_Graduated : constant Boolean :=
      (for Student of parallel Class =>
         <False> or Passed_Exam (Student));

I've taken a first crack at writing this up, but didn't spend a lot of time
wordsmithing. I'd hope to first get some feedback in Vienna to see this idea has
merit.

****************************************************************

From: Randy Brukardt
Sent: Monday, June 12, 2017  5:35 PM

> Here is a bit of a redesign in the area of parallel loops, and a new
> approach, parallel reduction expressions. The parallel block part is
> mostly unchanged.

Here's a couple barely informed comments on this.

First, a bit of a rant. It's disconcerting that we get a wildly different
proposal at each meeting. There's no opportunity to get the details right if we
change the basic proposal completely between every meeting.

In Pittsburgh, we spent a significant amount of time figuring out the right
model for reducers. I see here that all of the conclusions of that meeting have
completely been discarded in favor of what appears on first glance the
reducer-de-jour. Why should we even spend time trying to figure out this wildly
different proposal when it seems obvious that whatever we decide will be
discarded all over again next time?

I want to at least feel like we are making progress toward a solution, rather
than starting from nearly ground zero with each proposal. End of mini-rant.

...
> Special syntax, "<value>" is used to indicate where this feedback
> occurs, which also encloses the initial value to be used for the first
> iteration.

Regardless of the merits of the proposal itself (which I haven't thought about),
I don't think angle brackets is going to fly as syntax in Ada. There would be
substantial confusion with relational operators, which would likely make such
text unparsable. (We discussed this extensively with some of the assignment
target proposals; it didn't fly there and I don't see what could have changed in
8 months.)

You could use square or curly brackets for this, but in any case, such syntax
doesn't seem very Ada-like. I'd suggest using an attribute or "magic" function
call as the appropriate syntax.

> Here are a few examples that are in the AI, to give a basic idea.
>     -- Minimum value of an array
>     Min : Integer :=
>      (for parallel X of Arr => Integer'Min(<Integer'Last>,  X));

     Min : Integer :=
      (for parallel X of Arr => Integer'Min(<Integer'Last'Initial,  X));

or

     Min : Integer :=
      (for parallel X of Arr => Integer'Min(<Initial(Integer'Last),  X));

Maybe there is a better idea out there (neither of these look very promising),
but whatever we do needs to look and feel like Ada.

****************************************************************

Questions? Ask the ACAA Technical Agent