Version 1.10 of ai12s/ai12-0266-1.txt

Unformatted version of ai12s/ai12-0266-1.txt version 1.10
Other versions for file ai12s/ai12-0266-1.txt

!standard 5.5.1(4/3)          19-01-09 AI12-0266-1/07
!standard 5.5.1(6/4)
!standard 5.5.1(11/3)
!standard 5.5.2(4/3)
!class Amendment 18-03-28
!status work item 18-03-28
!status received 18-03-28
!priority Medium
!difficulty Medium
!subject Parallel Container Iterators
!summary
parallel iterators are defined for containers.
!problem
AI12-0119 provides mechanisms that can be used to iterate in parallel over the elements of an array. One would expect that there would be a similar need to be able to iterate over the elements of a container in parallel. Should such a capability be provided in Ada? (Yes).
!proposal
This proposal depends on the facilities for parallel loops (AI12-0119-1) and for aspect Nonblocking (AI12-0064-2).
The goals of this proposal are: - To extend parallel iteration to the standard container libraries
and user defined containers;
- To permit existing programs to benefit from parallelism through
minimal restructuring of sequential programs into ones that permit more parallel execution;
- To support the development of complete programs that maximize
parallelism;
- To efficiently guide the compiler in the creation of effective parallelism
with minimal input from the programmer;
- To avoid syntax that could generate erroneous behaviour or produce noticeably
different results if executed sequentially vs in parallel.
This proposal extends the interfaces in Ada.Containers.Iterator_Interfaces to include two new interfaces, a parallel iterator interface, and a reversible parallel iterator interface. Such iterators derived from these interfaces can be used for parallel iteration, but can also be used for sequential iteration for loops that do not have the parallel reserved keyword.
A parallel iterator provides a Split_Into_Chunks procedure that can be used to associate with the iterator a set of cursors that define the chunk boundaries for parallel iteration. Each cursor can be assigned to a different logical thread of control and identifies the first element of a different chunk of iterations where each chunk can be iterated through in parallel with other chunks to collectively cover all the elements of the container.
The starting cursor for each chunk of iteration can be obtained by calling the First_In_Chunk operation, passing in the chunk index of the split as a parameter to the call. The last cursor for each chunk of iteration can be obtained by calling the Last_In_Chunk operation, passing in the chunk index of the split as a parameter to the call. Similarly, the Next_In_Chunk and Previous_In_Chunk calls return the next or previous cursor in the chunk, and return a null cursor if the cursor is advanced past the last or before the first cursor of a chunk, respectively.
In addition, the loop syntax is extended to allow the loops with iterator specifications to have the parallel keyword.
!wording
Replace 5.5(3/3)
Iteration_scheme ::= while condition | [parallel] for loop_parameter_specification | [parallel] for iterator_specification
Modify 5.5.1 (2/5)
generic type Cursor; with function Has_Element (Position : Cursor) return Boolean; package Ada.Iterator_Interfaces is with Pure, Nonblocking => {Cursor'Nonblocking and Has_Element'Nonblocking}[False] is
Add after 5.5.1(4/3)
type Parallel_Iterator is limited interface and Forward_Iterator;
subtype Chunk_Index is Positive;
function Is_Split (Object : Parallel_Iterator) return Boolean is abstract;
procedure Split_Into_Chunks (Object : in out Parallel_Iterator; Max_Chunks : Chunk_Index) is abstract with Pre'Class => not Object.Is_Split or else raise Program_Error, Post'Class => Object.Is_Split and then Object.Chunk_Count <= Max_Chunks;
function Chunk_Count (Object : Parallel_Iterator) return Chunk_Index is abstract with Pre'Class => Object.Is_Split or else raise Program_Error;
function First_In_Chunk (Object : Parallel_Iterator; Chunk : Chunk_Index) return Cursor is abstract with Pre'Class => (Object.Is_Split and then Chunk <= Object.Chunk_Count) or else raise Program_Error;
function Next_In_Chunk (Object : Parallel_Iterator; Position : Cursor; Chunk : Chunk_Index) return Cursor is abstract with Pre'Class => (Object.Is_Split and then Chunk <= Object.Chunk_Count) or else raise Program_Error;
type Parallel_Reversible_Iterator is limited interface and Parallel_Iterator and Reversible_Iterator;
function Last_In_Chunk (Object : Parallel_Reversible_Iterator; Chunk : Chunk_Index) return Cursor is abstract with Pre'Class => (Object.Is_Split and then Chunk <= Object.Chunk_Count) or else raise Program_Error;
function Previous_In_Chunk (Object : Parallel_Reversible_Iterator; Position : Cursor; Chunk : Chunk_Index) return Cursor is abstract with Pre'Class => (Object.Is_Split and then Chunk <= Object.Chunk_Count) or else raise Program_Error;
Modify 5.5.1(6/3)
An iterator type is a type descended from the Forward_Iterator interface from some instance of Ada.Iterator_Interfaces. A reversible iterator type is a type descended from the Reversible_Iterator interface from some instance of Ada.Iterator_Interfaces. {A parallel iterator type is a type descended from the Parallel_Iterator interface from some instance of Ada.Iterator_Interfaces. A parallel reversible iterator type is a type descended from the Parallel_Reversible_Iterator interface from some instance of Ada.Iterator_Interfaces.} An iterator object is an object of an iterator type. A reversible iterator object is an object of a reversible iterator type. {A parallel iterator object is an object of a parallel iterator type. A parallel reversible iterator object is an object of a parallel reversible iterator type.} The formal subtype Cursor from the associated instance of Ada.Iterator_Interfaces is the iteration cursor subtype for the iterator type.
Modify 5.5.1(11/3)
An iterable container type is an indexable container type with specified Default_Iterator and Iterator_Element aspects. A reversible iterable container type is an iterable container type with the default iterator type being a reversible iterator type. {A parallel iterable container type is an iterable container type with the default iterator type being a parallel iterator type. A parallel reversible iterable container type is a reversible iterable container type with the default iterator type being a parallel reversible iterator type.} An iterable container object is an object of an iterable container type. A reversible iterable container object is an object of a reversible iterable container type. {A parallel iterable container object is an object of a parallel iterable container type. A parallel reversible iterable container object is an object of a parallel reversible iterable container type.}
Modify 5.5.2(4/3)
If the reserved word reverse appears, the iterator_specification is a reverse iterator{.}[;] {If the reserved word parallel appears, the iterator_specification is a parallel iterator;} otherwise it is a forward iterator. In a reverse generalized iterator, the iterator_name shall be of a reversible iterator type. {In a parallel generalized iterator, the iterator_name shall be of a parallel iterator type or a parallel reversible iterator type.} In a reverse container element iterator, the default iterator type for the type of the iterable_name shall be a reversible iterator type. {In a parallel container element iterator, the default iterator type for the type of the iterable_name shall be of a parallel iterator type or a parallel reversible iterator type.}
Modify AARM 5.5.2(8.a/5)
Ramification: The loop parameter{s} of a generalized iterator {have}[has] the same accessibility as the loop statement. This means that the loop parameter object{s} {are}[is] finalized when the loop statement is left. ([It]{They} also may be finalized as part of assigning a new value to the loop parameter{s}.) For array component iterators, the loop parameter{s each} directly denotes an element of the array and [has]{have} the accessibility of the associated array. For container element iterators, the loop parameter{s each} denotes the result of [the]{an} indexing function call (in the case of a constant indexing) or a generalized reference thereof (in the case of a variable indexing). Roughly speaking, the loop parameter{s have} [has] the accessibility level of a single iteration of the loop. More precisely, the function result (or the generalized reference thereof) is considered to be renamed in the declarative part of a notional block statement which immediately encloses the loop's sequence_of_statements; the accessibility of the loop parameter{s are} [is] that of the block statement.
Modify 5.5.2(10/3)
For a generalized iterator, the loop parameter{s are} [is] created, the iterator_name is evaluated, and the denoted iterator object becomes the loop iterator. In a forward generalized iterator, the operation First of the iterator type is called on the loop iterator, to produce the initial value for the loop parameter. If the result of calling Has_Element on the initial value is False, then the execution of the loop_statement is complete. Otherwise, the sequence_of_statements is executed and then the Next operation of the iterator type is called with the loop iterator and the current value of the loop parameter to produce the next value to be assigned to the loop parameter. This repeats until the result of calling Has_Element on the loop parameter is False, or the loop is left as a consequence of a transfer of control. For a reverse generalized iterator, the operations Last and Previous are called rather than First and Next.
Add after 5.5.2(10/3)
For a parallel generalized iterator, the operation Split_Into_Chunks of the iterator type is then called, with the Max_Chunks parameter specified as the upper bound for the number of loop parameter objects (and the number of logical threads of control) to be associated with the iterator. The number of loop parameters that were created is determined by calling the Chunk_Count operation of the iterator. Each loop parameter can be assigned to a different logical thread of control, with an ordinal chunk index value. The Initial cursor values to be associated with each loop parameter is determined by calling the First_In_Chunk operation of the iterator using the chunk index value of the loop parameter object as the Chunk parameter for the call. The sequence_of_statements is executed for each loop parameter object and the Next_In_Chunk operation of the iterator type is called with the chunk index value of the loop parameter object and the current value of the loop parameter to produce the next value to be assigned to a given loop parameter. This repeats until the result of calling Has_Element on the loop parameter is False, or the loop is left as a consequence of a transfer of control. For a reverse parallel generalized iterator, the operations Last_In_Chunk and Previous_In_Chunk are called rather than First_In_Chunk and Next_In_Chunk.
AARM Note
The number of loop parameters specified for the Max_Chunks parameter of the Split_Into_Chunks call is only a recommendation by the implementation. A container implementation may choose a lower value if a more optimal split can be determined. For instance, a tree based container might create the split based on the number of branches at the top levels of the tree.
Modify AARM 5.5.2(10.a/4)
Ramification: The loop parameter{s} of a generalized iterator [is a]{are} variable{s} of which the user only has a constant view. It follows the normal rules for a variable of its nominal subtype. In particular, if the nominal subtype is indefinite, the variable is constrained by its initial value. Similarly, if the nominal subtype is class-wide, the variable (like all variables) has the tag of the initial value. Constraint_Error may be raised by a subsequent iteration if Next [or]{,} Previous{, Next_In_Chunk, or Previous_In_Chunk} return an object with a different tag or constraint.
Modify 5.5.2(11/3)
For an array component iterator, the iterable_name is evaluated and the denoted array object becomes the array for the loop. If the array for the loop is a null array, then the execution of the loop_statement is complete. Otherwise, the sequence_of_statements is executed with [the] loop parameter{s that collectively denote} [denoting] each component of the array for the loop{. If the iterator is not a parallel array component iterator, the loop is iterated} using a canonical order of components, which is last dimension varying fastest (unless the array has convention Fortran, in which case it is first dimension varying fastest). For a forward array component iterator, the iteration starts with the component whose index values are each the first in their index range, and continues in the canonical order. For a reverse array component iterator, the iteration starts with the component whose index values are each the last in their index range, and continues in the reverse of the canonical order. {For a parallel array component iterator, the order of iteration is arbitrary.} The loop iteration proceeds until the sequence_of_statements has been executed for each component of the array for the loop, or until the loop is left as a consequence of a transfer of control.
Modify 5.5.2(12/3)
For a container element iterator, the iterable_name is evaluated and the denoted iterable container object becomes the iterable container object for the loop. The default iterator function for the type of the iterable container object for the loop is called on the iterable container object and the result is the loop iterator. [An object] {Objects} of the default cursor subtype [is]{are} created (the loop cursor{s}).
Modify 5.5.2(13/3) [into two paragraphs]
For a forward container element iterator, the operation First of the iterator type is called on the loop iterator, to produce the initial value for the loop cursor. If the result of calling Has_Element on the initial value is False, then the execution of the loop_statement is complete. Otherwise, the sequence_of_statements is executed with the loop parameter denoting an indexing (see 4.1.6) into the iterable container object for the loop, with the only parameter to the indexing being the current value of the loop cursor; then the Next operation of the iterator type is called with the loop iterator and the loop cursor to produce the next value to be assigned to the loop cursor. This repeats until the result of calling Has_Element on the loop cursor is False, or until the loop is left as a consequence of a transfer of control. For a reverse container element iterator, the operations Last and Previous are called rather than First and Next.{
For a parallel container element iterator, the operation Split_Into_Chunks of the iterator type is called, with the Max_Chunks parameter specified as the upper bound for the number of loop parameter objects to be associated with the iterator. The number of loop parameters that were created is determined by calling the Chunk_Count operation of the iterator. Each loop parameter can be assigned to a different logical thread of control, with an ordinal chunk index value. The Initial cursor values to be associated with each loop parameter is determined by calling the First_In_Chunk operation of the iterator using the chunk index value of the loop parameter object as the Chunk parameter for the call. The sequence_of_statements is executed for each loop parameter object and the Next_In_Chunk operation of the iterator type is called with the chunk index value of the loop parameter object and the current value of the loop parameter to produce the next value to be assigned to a given loop parameter. This repeats until the result of calling Has_Element on the loop parameter is False, or the loop is left as a consequence of a transfer of control.} If the loop parameter is a constant (see above), then the indexing uses the default constant indexing function for the type of the iterable container object for the loop; otherwise it uses the default variable indexing function.
Modify 5.5.2(15/3)
-- Array component iterator example: {parallel} for Element of Board loop -- See 3.6.1.
Element := Element * 2.0; -- Double each element of Board, a two-dimensional array. end loop;
Modify A.18.2(74.1/3)
function Iterate (Container : in Vector) return Vector_Iterator_Interfaces.{Parallel_}Reversible_Iterator'Class;
Modify A.18.2(74.2/3)
function Iterate (Container : in Vector; Start : in Cursor) return Vector_Iterator_Interfaces.{Parallel_}Reversible_Iterator'Class;
Modify A.18.2(230.1/3)
function Iterate (Container : in Vector) return Vector_Iterator_Interfaces.{Parallel_}Reversible_Iterator'Class;
Modify A.18.2(230.2/3)
Iterate returns a {parallel} reversible iterator object (see 5.5.1) that will generate a value for [a] loop parameter{s} (see 5.5.2) designating each node in Container, starting with the first node and moving the cursor as per the Next function when used as a forward iterator, and starting with the last node and moving the cursor as per the Previous function when used as a reverse iterator {, and starting with all nodes simultaneously using the Split function to generate cursors for all the iterations of the loop when used as a parallel iterator}. Tampering with the cursors of Container is prohibited while the iterator object exists (in particular, in the sequence_of_statements of the loop_statement whose iterator_specification denotes this object). The iterator object needs finalization.
Modify A.18.2(230.3/3)
function Iterate (Container : in Vector; Start : in Cursor) return Vector_Iterator_Interfaces.{Parallel_}Reversible_Iterator'Class;
Modify A.18.2(230.4/3)
If Start is not No_Element and does not designate an item in Container, then Program_Error is propagated. If Start is No_Element, then Constraint_Error is propagated. Otherwise, Iterate returns a {parallel }reversible iterator object (see 5.5.1) that will generate a value for [a] loop parameter{s} (see 5.5.2) designating each node in Container, starting with the node designated by Start and moving the cursor as per the Next function when used as a forward iterator, or moving the cursor as per the Previous function when used as a reverse iterator {, or all nodes simultaneously using the Split function when used as a parallel iterator}. Tampering with the cursors of Container is prohibited while the iterator object exists (in particular, in the sequence_of_statements of the loop_statement whose iterator_specification denotes this object). The iterator object needs finalization.
Modify A.18.3(46.1/3)
function Iterate (Container : in List) return List_Iterator_Interfaces.{Parallel_}Reversible_Iterator'Class;
Modify A.18.3(46.2/3)
function Iterate (Container : in List; Start : in Cursor) return List_Iterator_Interfaces.{Parallel_}Reversible_Iterator'Class;
Modify A.18.3(144.1/3)
function Iterate (Container : in List) return List_Iterator_Interfaces.{Parallel_}Reversible_Iterator'Class;
Modify A.18.3(144.2/3)
Iterate returns a {parallel }reversible iterator object (see 5.5.1) that will generate a value for [a] loop parameter{s} (see 5.5.2) designating each node in Container, starting with the first node and moving the cursor as per the Next function when used as a forward iterator, and starting with the last node and moving the cursor as per the Previous function when used as a reverse iterator {, and starting with all nodes simultaneously using the Split_Into_Chunks procedure to generate cursors for all the iterations of the loop when used as a parallel iterator}. Tampering with the cursors of Container is prohibited while the iterator object exists (in particular, in the sequence_of_statements of the loop_statement whose iterator_specification denotes this object). The iterator object needs finalization.
Modify A.18.3(144.3/3)
function Iterate (Container : in List; Start : in Cursor) return List_Iterator_Interfaces.Parallel_Reversible_Iterator'Class;
Modify A.18.3(144.4/3)
If Start is not No_Element and does not designate an item in Container, then Program_Error is propagated. If Start is No_Element, then Constraint_Error is propagated. Otherwise, Iterate returns a {parallel} reversible iterator object (see 5.5.1) that will generate [a] value{s} for [a] loop parameter{s} (see 5.5.2) designating each node in Container, starting with the node designated by Start and moving the cursor as per the Next function when used as a forward iterator, or moving the cursor as per the Previous function when used as a reverse iterator {or all nodes simultaneously using the Split_Into_Chunks procedure when used as a parallel iterator}. Tampering with the cursors of Container is prohibited while the iterator object exists (in particular, in the sequence_of_statements of the loop_statement whose iterator_specification denotes this object). The iterator object needs finalization.
Modify A.18.5(37.1/3)
function Iterate (Container : in Map) return Map_Iterator_Interfaces.{Parallel_}[Forward_]Iterator'Class;
Modify A.18.5(61.1/3)
function Iterate (Container : in Map) return Map_Iterator_Interfaces.{parallel}[Forward]_Iterator'Class;
Modify A.18.5(61.2/3)
Iterate returns a[n] {parallel} iterator object (see 5.5.1) that will generate [a] value{s} for [a] loop parameter{s} (see 5.5.2) designating each node in Container, starting with the first node and moving the cursor according to the successor relation {when used as a forward iterator, and starting with all nodes simultaneously using the Split_Into_Chunks procedure to generate cursors for all the iterations of the loop when used as a parallel iterator}. Tampering with the cursors of Container is prohibited while the iterator object exists (in particular, in the sequence_of_statements of the loop_statement whose iterator_specification denotes this object). The iterator object needs finalization.
Modify A.18.6(51.1/3)
function Iterate (Container : in Map) return Map_Iterator_Interfaces.{Parallel_}Reversible_Iterator'Class;
Modify A.18.6(51.2/3)
function Iterate (Container : in Map; Start : in Cursor) return Map_Iterator_Interfaces.{Parallel_}Reversible_Iterator'Class;
Modify A.18.6(94.1/3)
function Iterate (Container : in Map) return Map_Iterator_Interfaces.{Parallel_}Reversible_Iterator'Class;
Modify A.18.6(94.2/3)
Iterate returns a {parallel} reversible iterator object (see 5.5.1) that will generate [a] value{s} for [a] loop parameter{s} (see 5.5.2) designating each node in Container, starting with the first node and moving the cursor according to the successor relation when used as a forward iterator, and starting with the last node and moving the cursor according to the predecessor relation when used as a reverse iterator {, and starting with all nodes simultaneously using the Split_Into_Chunks procedure to generate cursors for all the iterations of the loop when used as a parallel iterator}. Tampering with the cursors of Container is prohibited while the iterator object exists (in particular, in the sequence_of_statements of the loop_statement whose iterator_specification denotes this object). The iterator object needs finalization.
Modify A.18.6(94.3/3)
function Iterate (Container : in Map; Start : in Cursor) return Map_Iterator_Interfaces.{Parallel_}Reversible_Iterator'Class;
Modify A.18.6(94.4/3)
If Start is not No_Element and does not designate an item in Container, then Program_Error is propagated. If Start is No_Element, then Constraint_Error is propagated. Otherwise, Iterate returns a {parallel} reversible iterator object (see 5.5.1) that will generate [a] value{s} for [a] loop parameter{s} (see 5.5.2) designating each node in Container, starting with the node designated by Start and moving the cursor according to the successor relation when used as a forward iterator, or moving the cursor according to the predecessor relation when used as a reverse iterator {, or all nodes simultaneously using the Split_Into_Chunks procedure when used as a parallel iterator}. Tampering with the cursors of Container is prohibited while the iterator object exists (in particular, in the sequence_of_statements of the loop_statement whose iterator_specification denotes this object). The iterator object needs finalization.
Modify A.18.8(49.1/3)
function Iterate (Container : in Set) return Set_Iterator_Interfaces.{parallel}[Forward]_Iterator'Class;
Modify A.18.8(85.1/3)
function Iterate (Container : in Set) return Set_Iterator_Interfaces.{parallel}[Forward]_Iterator'Class;
Modify A.18.8(85.2/3)
Iterate returns a[n] {parallel} iterator object (see 5.5.1) that will generate [a] value{s} for [a] loop parameter{s} (see 5.5.2) designating each element in Container, starting with the first element and moving the cursor according to the successor relation {when used as a forward iterator, and starting with all nodes simultaneously using the Split_Into_Chunks procedure to generate cursors for all the iterations of the loop when used as a parallel iterator.} Tampering with the cursors of Container is prohibited while the iterator object exists (in particular, in the sequence_of_statements of the loop_statement whose iterator_specification denotes this object). The iterator object needs finalization.
Modify A.18.9(61.1/3)
function Iterate (Container : in Set) return Set_Iterator_Interfaces.{Parallel_}Reversible_Iterator'Class;
Modify A.18.9(61.2/3)
function Iterate (Container : in Set; Start : in Cursor) return Set_Iterator_Interfaces.{Parallel_}Reversible_Iterator'Class;
Modify A.18.9(113.1/3)
function Iterate (Container : in Set; Start : in Cursor) return Set_Iterator_Interfaces.{Parallel_}Reversible_Iterator'Class;
Modify A.18.9(113.2/3)
Iterate returns a {parallel} reversible iterator object (see 5.5.1) that will generate [a] value{s} for [a] loop parameter{s} (see 5.5.2) designating each element in Container, starting with the first element and moving the cursor according to the successor relation when used as a forward iterator, and starting with the last element and moving the cursor according to the predecessor relation when used as a reverse iterator {, and starting with all nodes simultaneously using the Split_Into_Chunks procedure to generate cursors for all the iterations of the loop when used as a parallel iterator}. Tampering with the cursors of Container is prohibited while the iterator object exists (in particular, in the sequence_of_statements of the loop_statement whose iterator_specification denotes this object). The iterator object needs finalization.
Modify A.18.9(113.3/3)
function Iterate (Container : in Set; Start : in Cursor) return Set_Iterator_Interfaces.{Parallel_}Reversible_Iterator'Class;
Modify A.18.9(113.4/3)
If Start is not No_Element and does not designate an item in Container, then Program_Error is propagated. If Start is No_Element, then Constraint_Error is propagated. Otherwise, Iterate returns a {parallel} reversible iterator object (see 5.5.1) that will generate [a] value{s} for [a] loop parameter{s} (see 5.5.2) designating each element in Container, starting with the element designated by Start and moving the cursor according to the successor relation when used as a forward iterator, or moving the cursor according to the predecessor relation when used as a reverse iterator {, or all nodes simultaneously using the Split_Into_Chunks procedure when used as a parallel iterator}. Tampering with the cursors of Container is prohibited while the iterator object exists (in particular, in the sequence_of_statements of the loop_statement whose iterator_specification denotes this object). The iterator object needs finalization.
Modify A.18.10(44/3)
function Iterate (Container : in Tree) return Tree_Iterator_Interfaces.{parallel}[Forward]_Iterator'Class;
Modify A.18.10(45/3)
function Iterate_Subtree (Position : in Cursor) return Tree_Iterator_Interfaces.{parallel}[Forward]_Iterator'Class;
Modify A.18.10(156/3)
function Iterate (Container : in Tree) return Tree_Iterator_Interfaces.{parallel}[Forward]_Iterator'Class;
Modify A.18.10(157/4)
Iterate returns a[n] {parallel} iterator object (see 5.5.1) that will generate [a] value{s} for [a] loop parameter{s} (see 5.5.2) designating each element node in Container, starting from with the root node and proceeding in a depth-first order {when used as a forward_iterator, and starting with all nodes simultaneously using the Split_Into_Chunks procedure to generate cursors for all the iterations of the loop when used as a parallel iterator}. Tampering with the cursors of Container is prohibited while the iterator object exists (in particular, in the sequence_of_statements of the loop_statement whose iterator_specification denotes this object). The iterator object needs finalization.
Modify A.18.10(158/3)
function Iterate_Subtree (Position : in Cursor) return Tree_Iterator_Interfaces.{parallel}[Forward]_Iterator'Class;
Modify A.18.10(159/3)
If Position equals No_Element, then Constraint_Error is propagated. Otherwise, Iterate_Subtree returns a[n] {parallel} iterator object (see 5.5.1) that will generate [a] value{s} for [a] loop parameter{s} (see 5.5.2) designating each element in the subtree rooted by the node designated by Position, starting from with the node designated by Position and proceeding in a depth-first order {when used as a forward iterator, or all nodes simultaneously using the Split_Into_Chunks procedure when used as a parallel iterator}. If Position equals No_Element, then Constraint_Error is propagated. Tampering with the cursors of the container that contains the node designated by Position is prohibited while the iterator object exists (in particular, in the sequence_of_statements of the loop_statement whose iterator_specification denotes this object). The iterator object needs finalization.
!discussion
Containers such as vectors are obvious candidates for parallelization, since it is easy to conceptually think of a vector as a being splittable into a number of smaller vectors, and then applying a divide and conquer approach to the vector subsets. However, even a sequentially accessed container such as a linked list can be iterated in parallel if cursors for each parallel executor can be obtained prior to execution of the loop. This typically involves iterating once through the container to obtain the cursors, but can still benefit from parallel processing if the amount of time to process each node in the loop is significantly greater than the amount of time needed to iterate through the container.
The container writer has the ability to implement the container to define an optimal number of loop parameters to be setup by the Split_Into_Chunks call, so long as the number of splits is not greater than the value of the Max_Chunks parameter.
In AI12-0119 we admitted the possibility of chunks, but the model was always one logical thread of control per iteration. Here (and in AI12-0251-1) we make chunks explicit in the model, with one logical thread of control per chunk. This is important to make it clear that all of the iterations associated with a chunk are part of a single logical thread of control, so there is no data race between the iterations of a single chunk.
This proposal allows the container implementer a fair amount of flexibility in deciding how to implement the cursor array stored in the iterator object. In general, an implementor should allow for as many elements in the cursor array, as there are cores on the multicore processor. But one could still decide to use a fixed sized array with enough elements to provide good parallelism for most typical processors available today. For instance, a fixed size cursor array of size 64 elements might be a reasonable size to provide good parallelism for most uses. On the other hand, it might make sense to allocate the cursor array dynamically during the call to Split. Then the array can be sized to match any number of cores. Another advantage of dynamic allocation is that the storage would not be needed for iterators associated with non-parallel loops. The iterator types of the containers are controlled types, so finalization of the iterator objects is already part of the iteration model.
It would have been nice to specify a post condition for the Split_Into_Chunks call something like;
with Post => (for all I in 1 .. Object.Split_Count =>
Has_Element (Object.Get_Cursor (I)))
However, this is not allowed because Has_Element cannot be called during a post condition in this package because the Cursor formal type is an untagged incomplete type.
Similarly, it would have been nice to write a Post condition for First_In_Chunk where calling Has_Element on the result would return true, however this is also not allowed for the same reason.
We could consider adding these as Pre and Post contracts in the derived types in each of the containers, but the iterator types are not declared in the public part of the package, so it seems to be not worthwhile to do so.
!ASIS
** TBD.
!ACATS test
ACATS B- and C-Tests are needed to check that the new capabilities are supported.
!appendix

From: Brad Moore
Sent: Wednesday, March 28, 2018  12:06 AM

Here is a new AI split off from AI12-0119 (Parallel Operations).
[This is version /01 of the AI - ED]

This one deals specifically with parallel container iteration.
The writeup is very much the same as before.

****************************************************************

From: Tucker Taft
Sent: Wednesday, March 28, 2018   9:11 PM

Thanks.  I am thinking this should probably also be in Chapter 9 after
introducing the notion of tasklets, rather than trying to weave it into the
current discussion of iterators, which is already pretty complex.

****************************************************************

From: Brad Moore
Sent: Thursday, June 14, 2018   1:15 AM

Here is an update to AI12-0266, [This is version /02 of the AI - Editor.]

The main changes are;

- Changed Chunk_Array type to be called Cursor_Array type.

The reason is that "Chunk" is not well defined in the RM, and an array of
chunks leaves a bit to the imagination.

A cursor array however seems a lot easier to visualize.

The semantics of Chunk_Array was that for every chunk there was a Start and
Finish call that could be used to determine the first and last cursor of each
chunk. One problem with this is that it is not clear if the last cursor is
inclusive or exclusive to the set of cursors to be iterated over.

With a cursor array, there is only one cursor per chunk, and only one call,
To_Cursor, available to get the starting cursor. To get the end cursor you
use the next highest index value, which should be more obvious that this is
an exclusive cursor value. The idea also is that if you specify an index value
higher than the number of chunks in the cursor array, you get a No_Element
result. So calling To_Cursor with index + 1 consistently gives you a cursor
value that can be compared against for chunk iteration termination.

Another change is that the Split operation is no longer a function returning
an object of a class-wide type. Instead, Split is a procedure that has an out
parameter of a class-wide type. (A change requested by Tucker)

By doing this, I think there needed to be a way for the container implementer
to tell the compiler which type to use for declaring the cursor array type. So
I added an Iterator_Cursor_Array aspect, similar to Iterator_Element aspect,
that may be specified on a container type declaration for this purpose.

Since the implementation is now responsible for declaring the cursor array
type, the idea is that the number of elements of the cursor array is now the
means to indicate to the container implementer, how many chunks to create.

A container implementer can use less than the number of elements of the cursor
array, but then must leave the remaining elements set as No_Element.

As a result of this, there no longer needs to be an Advised_Split parameter to
the Split call.

Another change is that since the Split function needs to store cursors into an
out parameter Cursor_Array class-wide object, there needed to be a way to
store a cursor into a cursor array object. So I added a Set_Cursor procedure
to the Cursor_Array interface.

****************************************************************

From: Brad Moore
Sent: Thursday, June 14, 2018  10:30 AM

A question I have about this approach, is it enough to use the
Iterator_Cursor_Array aspect to tell the compiler how to set up the cursor
array object?

Specifically, in my test example using Ada.Containers.Doubly_Linked_Lists, I
have the cursor array type having a visible discriminant to set the length of
the array.

Do we need to specify in the RM that it is expected that all cursor array
types will have a single discriminant of a type (Natural) indicating the
number of elements in a cursor array?

Or maybe we need another aspect to somehow tell the compiler how to specify
the cursor array length?

Or we could add a Set_Length function to the interface, but then that likely
implies a controlled type with dynamic memory allocation, which seems to be
undesirable.

This also has me wondering if we might have been better off leaving the Split
subprogram as a function rather than a procedure. Then the container
implementor can create tne cursor array object inside the Split call with the
desired length. We can then also simplify the interface by eliminating the
Set_Cursor call.

I note also that we do have other iterator calls that are functions returning
class-wide types, such as Iterate in the containers.

****************************************************************

From: Randy Brukardt
Sent: Thursday, June 14, 2018   1:46 PM

> A question I have about this approach, is it enough to use the
> Iterator_Cursor_Array aspect to tell the compiler how to set up the
> cursor array object?

Let's start further back. Why do we need a Iterator_Cursor_Array object at all?

There would have been value to that approach if you could have declared an
actual array (fewer subprogram calls), but since it has to be private,
getting each cursor is going to involve a subprogram call.

So, what is the separate object gaining over just putting the Get_Cursor call
directly on the iterator object (the implementation of which is totally
user-controlled, so it could have an array of cursors internally if it wants)?
The "user" of this structure is the parallel loop, and it obviously has an
iterator object it is going to use to iterate. Why does it need anything
more??

So I would suggest that Split is just a call to tell the iterator how to
organize its internal data structure. Still need the Advised_Split parameter,
something like:

   procedure Split
     (Object        : in out Parallel_Iterator;
      Advised_Split : in Natural) is abstract;
      -- Tell Object to create split data with the advised number of splits.

   function Split_Length (Object : in Parallel_Iterator) return Natural is abstract;
      -- Determine the actual split length.

   function Split_Cursor (Object : in Parallel_Iterator;
                          Split_Index : in Natural) return Cursor;

or, since it would save an assignment in many implementations:

   procedure Split_Cursor (Object : in Parallel_Iterator;
                           Split_Index : in Natural;
                           Result : out Cursor);

Either of these would have the semantics you suggested yesterday (chunk ends
with the 'Succ index, get No_Element if Split_Index >= Length).

Do you need anything else?

****************************************************************

From: Randy Brukardt
Sent: Thursday, June 14, 2018   8:10 PM

> > A question I have about this approach, is it enough to use the
> > Iterator_Cursor_Array aspect to tell the compiler how to set up the
> > cursor array object?
>
> Let's start further back. Why do we need a Iterator_Cursor_Array
> object at all?

Sorry, I meant Cursor_Array. I couldn't convince Outlook to let me view your
AI update in anything other than Notepad (which just shows a giant block of
text; it doesn't recognize Linux line endings) so I had to go from a quick
scan of your AI when I wrote this this afternoon (hoping for quick
turnaround). Now that I've posted it, I see what I did wrong here.

...
> So, what is the separate object gaining over just putting the
> Get_Cursor call directly on the iterator object (the implementation of
> which is totally user-controlled, so it could have an array of cursors
> internally if it wants)?

Answering my own question based on your most recent wording: I don't see
any. If we let the iterator implementor create a (possibly virtual) array
inside their implementation, we don't need any explicit object. It seems to
make *everything* simpler (no strange new aspect, no questions about
where/when the object gets declared, less to talk about in how the iterator
executes, etc.)

> The "user" of this structure is the parallel loop, and it obviously
> has an iterator object it is going to use to iterate. Why does it need
> anything more??

The actual implementation probably has to give a start/end cursor to each
thread when it is given work, but the format of that should be 100% up to the
implementation. The less we specify, the more likely it is that an implementer
can optimize it and create the fastest possible code.

> So I would suggest that Split is just a call to tell the iterator how
> to organize it's internal data structure. Still need the Advised_Split
> parameter, something like:
>
>    procedure Split
>      (Object        : in out Parallel_Iterator;
>       Advised_Split : in Natural) is abstract;
>       -- Tell Object to create split data with the advised number of splits.

Note that the internal cursor array (inside of a Parallel_Iterator'Class
object) could be fixed length (meaning that no more splits than the max number
is supported) or it could be dynamically allocated. I don't see much reason to
support massive numbers of splits anyway, certainly not to require it. (Perhaps
we might give a minimum number in Implementation Advice?)

>    function Split_Length (Object : in Parallel_Iterator) return
> Natural is abstract;
>       -- Determine the actual split length.

We'd need say what happens here if procedure Split hasn't been called. I'd
suggest saying that it returns 1.

>    function Split_Cursor (Object : in Parallel_Iterator;
>                           Split_Index : in Natural) return Cursor;
>
> or, since it would save an assignment in many implementations:
>
>    procedure Split_Cursor (Object : in Parallel_Iterator;
>                            Split_Index : in Natural;
>                            Result : out Cursor);
>
> Either of these would have the semantics you suggested yesterday
> (chunk ends with the 'Succ index, get No_Element if Split_Index >=
> Length).

Again, if Split hasn't been called, the first cursor is Object.First, and the
rest are No_Element.

> Do you need anything else?

I don't think so, based on my reading of your current text.

Assuming you concur, if you could update the AI this way before the deadline,
it would help progress it in Lisbon.

****************************************************************

From: Brad Moore
Sent: Thursday, June 14, 2018   10:54 PM

Thanks Randy, the idea of hiding the cursor array inside the iterator hadn't
occurred to me, but it looks like it does simplify things a lot, as well as
gives more flexibility to the implementation.

I will attempt to update the AI to include these ideas.

****************************************************************

From: Brad Moore
Sent: Friday, June 15, 2018   11:22 AM

>      -- Determine the actual split length.
>
>   function Split_Cursor (Object : in Parallel_Iterator;
>                          Split_Index : in Natural) return Cursor;
>
> or, since it would save an assignment in many implementations:
>
>   procedure Split_Cursor (Object : in Parallel_Iterator;
>                           Split_Index : in Natural;
>                           Result : out Cursor);
>


I am thinking of going with the function version, because

a) It is easier to work with, as it possibly saves having to declare two
objects, since the client needs both a start and end cursor, and

b) Because I suspect compilers would typically apply a transformation of
the users code to a procedure call accepting two cursors. The function results
can be inline expressions for the parameters to the call, so I dont think we'd
be saving an assignment in that case. Otherwise you have to assign to the
declared cursor variables, and then assign again to the parameters of the
call.

eg.

      --  2% Raise to everyone!
      parallel
      for Employee of List loop
          Employee.Salary := @ + @ * 0.02;
      end loop;

might get transformed to something like;

   declare
      procedure Give_Raise
        (From, To    : Employee_Lists.Cursor)
      is
         Position : Employee_Lists.Cursor := From;
         use type Employee_Lists.Cursor;
      begin --  Get_Payroll

         while Position /= To loop

            --  2% Raise to everyone!
            List (Position).Salary := @ + @ * 0.02;
            Employee_Lists.Next (Position);
         end loop;

      end Give_Raise;

      Iterator : Emp_List_Iterators.Parallel_Reversible_Iterator'Class :=
        List.Iterate;

   begin

      Iterator.Split (Advised_Split => 4);

      parallel
      for Chunk in 1 .. Iterator.Split_Count loop
         Give_Raise (From => Iterator.Get_Cursor (Chunk),
                     To   => Iterator.Get_Cursor (Chunk + 1));
      end loop;

   end;

****************************************************************

From: Brad Moore
Sent: Friday, June 15, 2018   11:43 AM

> Note that the internal cursor array (inside of a
> Parallel_Iterator'Class
> object) could be fixed length (meaning that no more splits than the
> max number is supported) or it could be dynamically allocated. I don't
> see much reason to support massive numbers of splits anyway, certainly
> not to require it. (Perhaps we might give a minimum number in
> Implementation Advice?)

The number of splits should typically be at least as many as the number of
available cores. For some platforms that number might be unknown.
For example, for IRTAW, I was interacting with a couple of people from the
Barcelona supercomputing center, where they were experimenting with
parallelism on 50 core computers. They managed to get my Paraffin code
running on that system, so I think it was a Linux based system.

I think storing the cursor array in the iterator for such a system would
suggest that the cursor array would needs to be dynamically allocated, (by
the Split call). As otherwise, the iterator object could become a very large
object to accomodate the largest possible number of cores, and that might
cause troubles if declared on a stack. If you know your target platform only
supports a small number of cores, then a fixed size array could be
appropriate. Another advantage of dynamically allocating is that if the Split
function is not called, (for example for a non-parallel loop), then nothing
needs to be freed upon finalization of the iterator, and no storage is needed
for the cursor array during the lifetime of the iterator object.

****************************************************************

From: Randy Brukardt
Sent: Friday, June 15, 2018   2:37 PM

I mentioned using a statically sized array because this has to work in the
bounded forms, where dynamic allocation and finalization is strongly
discouraged. Perhaps there is some way to use the secondary stack in GNAT
for this allocation, but otherwise the size needs to be known early.

I thought that a 64 cursor array probably would be enough for most existing
targets, and the size of that (128 words) is not going to be so large as to
cause stack problems. There's probably going to be memory use issues if
someone has 10,000 cores, but that's mostly theoretical (and I've read that
the number of cores is likely to be limited by memory contention issues).

For the unbounded forms, it's probably better to use dynamic allocation, but
obviously that's up to the implementation.

****************************************************************

From: Randy Brukardt
Sent: Friday, June 15, 2018   2:50 PM

> I am thinking of going with the function version, because
>
> a) It is easier to work with, as it possibly saves having to declare
> two objects, since the client needs both a start and end cursor, and

I thought the compiler would declare them somewhere anyway. In any case,
having thought on it more, I doubt that there would be any difference since
the objects in question are compiler-generated and thus it is not necessary
to preserve Ada semantics for their assignments.

> b) Because I suspect compilers would typically apply a transformation
> of the users code to a procedure call accepting two cursors. The
> function results can be inline expressions for the parameters to the
> call, so I dont think we'd be saving an assignment in that case.
> Otherwise you have to assign to the declared cursor variables, and
> then assign again to the parameters of the call.

I expected that the "manager" (the main thread) would set of an array of "job"
records, and the procedure call would simply pass one of the job records. For
this sort of iterator, the job would contain the two cursors.

Note that you're not really saving anything by directly passing cursors this
way. Cursors don't fit in a parameter on most targets, so they're probably
getting passed by reference. So there has to be a temporary object declared
somewhere to allow that.

If you're generating this expansion at the intermediate code level (as we
would do in Janus/Ada), you have declare all of these things explicitly anyway
(nothing implicit in intermediate code!). So while there might be a small
convinience when using a library (like Paraffin), it's irrelevant to an
implementation. (At least some compilers implement functions like this as
procedure calls anyway, so there isn't any difference at the implementation
level - unless an extra temporary is needed to meet Ada semantics, as in an
assignment.)

****************************************************************

From: Brad Moore
Sent: Friday, June 15, 2018   2:14 PM

Here is an update to AI12-266, Container Iterators [This is version /03 of the
AI - Editor.]

The main changes are to eliminate the cursor_array interface, and instead
imply that a cursor array or set of cursors is stored in the iterator object,
when Split is called. This approach overall is a lot simpler than the previous
version. There is no need for the Iterator_Cursor_Array aspect that the '
previous proposal had.

This gives the container implementor more freedom in designing data structures
for storing the loop parameters needed for parallel iteration.

I also changed the name of the calls, from To_Cursor to Get_Cursor, and from
Length to Split_Count, which I think adds clarity.

Also, instead of using Natural as the subtype for split indexes and cursor
counts, I defined

   subtype Cursor_Count is Positive;
   subtype Split_Index is Positive;

Which I think also adds clarity and ties the calls Split, Split_Count,
Get_Cursor, together more meaningfully.

Finally, I also added some pre and post conditions which help to make it
clearer how these calls work.

I would have liked to specify post conditions that check that
Has_Element(Get_Cursor (I)) for all I in 1 .. Object.Split_Count is true
after calling Split, or a post condition for Get_Cursor such as Post =>
(if Index = Object.Split_Count + 1 then Has_Element(Get_Cursor'Result) = False

But the compiler tells me that one cannot use Has_Element in a Pre or Post
condition in this package because the call involves a non tagged incomplete
type (Cursor)

I suppose we could add these as Pre and Post contracts to each of the
containers but the iterator types are not visible in the public part of the
package, so it seems not worthwhile to do so.

****************************************************************

From: Randy Brukardt
Sent: Friday, June 15, 2018   9:07 PM

> Here is an update to AI12-266, Container Iterators
>
> The main changes are to eliminate the cursor_array interface, and
> instead imply that a cursor array or set of cursors is stored in the
> iterator object, when Split is called. This approach overall is a lot
> simpler than the previous version.
> There is no need for the Iterator_Cursor_Array aspect that the '
> previous proposal had.

Thanks, this seems a lot simpler.

For the record, I one editorial fix.

Part of the interface referenced the package "Ada202x". I presume this is
something you used in your testing, since there is no such thing proposed
here or anywhere else (nor did it appear in the previous version of the AI).
So I replaced all of those with "Ada".


I'd probably add a bit of discussion similar to our recent back-and-forth on
how the cursor array could get implemented in the iterator object. Some people
might be concerned about the apparent need for dynamic allocation, and that
should explicitly addressed.

...
> I would have liked to specify post conditions that check that
> Has_Element(Get_Cursor (I)) for all I in 1 ..
> Object.Split_Count is true after calling Split, or a post condition
> for Get_Cursor such as Post => (if Index = Object.Split_Count + 1 then
> Has_Element(Get_Cursor'Result) = False
>
> But the compiler tells me that one cannot use Has_Element in a Pre or
> Post condition in this package because the call involves a non tagged
> incomplete type
> (Cursor)

That sounds correct. Perhaps the rule is a bit more restrictive than it needs
to be (one could allow the calls and then require a recheck for an instance -
an assume-the-best rule), but changing that doesn't sound pleasant.

> I suppose we could add these as Pre and Post contracts to each of the
> containers but the iterator types are not visible in the public part
> of the package, so it seems not worthwhile to do so.

Don't see the point. The most important use of such Pre/Post is to eliminate
the need to describe these restrictions and results in English. And they
wouldn't apply to user-defined iterators if you only put them into the
container.

****************************************************************

From: Brad Moore
Sent: Saturday, June 16, 2018  12:21 AM

> For the record, I one editorial fix.
>
> Part of the interface referenced the package "Ada202x". I presume this
> is something you used in your testing, since there is no such thing
> proposed here or anywhere else (nor did it appear in the previous version of
> the AI). So I replaced all of those with "Ada".

Yes, that was just for testing, I forgot to do the change you did this time
round.

> I'd probably add a bit of discussion similar to our recent
> back-and-forth on how the cursor array could get implemented in the
> iterator object. Some people might be concerned about the apparent
> need for dynamic allocation, and that should explicitly addressed.

I have attached another version with the discussion section updated to mention
the possibility of implementation using fixed size array vs dynamic allocation.
[This is version /04 of the AI - Editor.]

> ...
>> I would have liked to specify post conditions that check that
>> Has_Element(Get_Cursor (I)) for all I in 1 ..
>> Object.Split_Count is true after calling Split, or a post condition
>> for Get_Cursor such as Post => (if Index = Object.Split_Count + 1
>> then Has_Element(Get_Cursor'Result) = False
>>
>> But the compiler tells me that one cannot use Has_Element in a Pre or
>> Post condition in this package because the call involves a non tagged
>> incomplete type
>> (Cursor)
>
> That sounds correct. Perhaps the rule is a bit more restrictive than
> it needs to be (one could allow the calls and then require a recheck
> for an instance - an assume-the-best rule), but changing that doesn't
> sound pleasant.

The more I think about this, the more I think one ought to be able to specify
such class-wide pre and post conditions for abstract subprograms.

I wonder if there might be a way to relax the rules in a simple way to allow
this at least. Anyway, you are probably right that it might not be pleasant,
but maybe something to consider if it would be worthwhile, at some point.

****************************************************************

From: Randy Brukardt
Sent: Saturday, June 16, 2018  2:56 AM

> > I'd probably add a bit of discussion similar to our recent
> > back-and-forth on how the cursor array could get implemented in the
> > iterator object. Some people might be concerned about the apparent
> > need for dynamic allocation, and that should explicitly addressed.
>
> I have attached another version with the discussion section updated to
> mention the possibility of implementation using fixed size array vs
> dynamic allocation.

You'll excuse me if I don't post this now; it's 12 hours past the deadline,
I've already posted the agenda and the files and loaded my laptop and there's
water on the floor in here from a severe thunderstorm that just went thru...

> >
...
> > That sounds correct. Perhaps the rule is a bit more restrictive than
> > it needs to be (one could allow the calls and then require a recheck
> > for an instance - an assume-the-best rule), but changing that
> > doesn't sound pleasant.
> >
>
> The more I think about this, the more I think one ought to be able to
> specify such class-wide pre and post conditions for abstract
> subprograms.

The problem has nothing to do with preconditions of abstract subprograms.
It's that you're doing it in a generic with a formal incomplete type that
isn't supposed to be used in any way other than to declare access types and
subprogram parameters. And that was because we needed this generic to work on
a *private* cursor. And you might remember that we were never able to find an
adequate solution for *that* problem.

> I wonder if there might be a way to relax the rules in a simple way to
> allow this at least.
> Anyway, you are probably right that it might not be pleasant, but
> maybe something to consider if it would be worthwhile, at some point.

I was thinking that we might be able to allow it in a generic if we require
the recheck. However, the recheck would fail in this case (the type being
private) so I don't think that helps much. In any case, incomplete types are
not supposed to have any sort of object created for them, and allowing this
case without allowing anything that causes real trouble sounds painful. It
would have been better if we didn't need to use this hack to get an instance
of a private type - but we *know* that was painful.

****************************************************************

From: Brad Moore
Sent: Sunday, October 14, 2018  3:53 PM

Here is a modest update to AI12-0266-1 Parallel Container Iteration.
[This is version /05 of the AI - Editor.]

Mostly the changes involve renaming types and subprograms to address ARG
comments. A Chunk_Finished call was added to the Iterator_Interfaces, rather
than expecting the compiler to compare the current cursor with another cursor
value, which cant be counted on, since equality checks are not part of the
interface.

One thought for consideration was to have the Split_Into_Chunks call somehow
temporary invalidate all existing cursors, and have the First, Next, Last, and
Previous only allow retrieving cursors associated with the current chunk and a
particular logical thread of control.

That idea held some appeal, as it would seem to add safety to the containers. It
would be more difficult for the user to write code to iterate from one chunk
into another. That would also allow us to eliminate the call to Chunk_Finished.
Since we are already calling Next to Advance the cursor, we could define Next to
return No_Element if one advances to the end of a chunk, and do the usual
comparison against No_Element to terminate the iteration.

In any case, this idea did not make it into the AI in any form. Just food for
thought. Perhaps that idea could be discussed at the meeting if it has any
merit.

****************************************************************

From: Randy Brukardt
Sent: Tuesday, October 16, 2018  11:36 PM

...
> One thought for consideration was to have the
> Split_Into_Chunks call somehow
> temporary invalidate all existing cursors, and have the
> First, Next, Last, and Previous only
> allow retrieving cursors associated with the current chunk
> and a particular logical thread of control.
>
> That idea held some appeal, as it would seem to add safety to
> the containers. ...

Sure, but what would the implementation be? A canonical cursor is a pair of
access values (one to the container, one to the node containing the element).
How would the implementation figure out if the cursors are "temporarily
invalidated"? Walking lists in the container might work, but would be O(N) and
cursors are supposed to be O(1). And you'd have to worry about tasking
interactions (there are many logical threads going here at once). Sounds to me
like it would make tampering look like a cheap idea in contrast. :-)

---

A nitpick:

>A parallel iterator provides a Split_Into_Chunks procedure that can be used
>to associate with the iterator a set of cursors that define the chunk
>boundaries for parallel iteration. Each cursor is assigned to a different
>logical thread of control ...

Probably should be "can be assigned", since there is no requirement on the
implementation to use different threads of control for each chunk. (One might
like that, but it's not always practical.) Or, maybe just drop the whole
discussion about "logical thread of control" here and just focus on the chunk.
I'll let you decide on this, I know you are sending me another change that we've
been discussing privately.

---

> ... A parallel reversible iterator type is a type descended
from both the Parallel_Reversible_Iterator interace from some
instance of Ada.Iterator_Interfaces.}

"both"? There's only one "interace" mentioned. "Interace" sounds like it could
get political. :-)

I fixed both of these.

---

I reformatted a lot of paragraphs to keep the lines under 80 characters. Some
people appreciate that.

****************************************************************

From: Randy Brukardt
Sent: Monday, November 19, 2018  11:18 PM

The (very drafty) minutes for AI12-0266-1 say:

> Can Split return zero chunks? Randy thought not, since something has
> to be iterated, but others think that is also silly. So we need to
> adjust the subtypes.

OTOH, approved AI12-0251-1 (which defines a chunk_specification) says:

> When the chunk_specification is elaborated, a check is made that the
> determined maximum number of logical threads of control is greater
> than zero. If this check fails, Program_Error is raised.

So here we're not allowing zero chunks. It seems weird to me that a parallel
iterator would allow zero chunks even though the underlying loop would not.

Brad had used a subtype to ensure that the routines couldn't return no chunks,
and it seems to me that that would be more consistent with the
chunk_specification.

Thoughts??

****************************************************************

From: Tucker Taft
Sent: Tuesday, November 20, 2018  9:19 AM

> So here we're not allowing zero chunks. It seems weird to me that a
> parallel iterator would allow zero chunks even though the underlying loop
> would not.

I see these as somewhat different.  One is specifying the maximum number of
chunks.  The other is returning the actual number of chunks.  It seems weird to
ever specify a maximum of zero, but it seems fine to have an actual number of
chunks being zero.  I suppose we could allow a maximum of zero if and only if
the underlying iteration has zero elements.  This would allow a computation that
divided the number of iterations by the chunk size, always rounding up.

> Brad had used a subtype to ensure that the routines couldn't return no
> chunks, and it seems to me that that would be more consistent with the
> chunk_specification.
>
> Thoughts??

I think it should be OK for the Split routine to return zero chunks if there are
no elements.  I don't believe there is a promise that it returns the maximum
number of chunks. but only that the number of chunks is no greater than the
specified maximum.  So I think things are OK as is, perhaps modulo allowing a
zero maximum if the number of elements is known to be zero as well.

****************************************************************

From: Randy Brukardt
Sent: Tuesday, November 20, 2018  1:21 PM

The number of chunks is also the number of logical threads of control. And I
can't see how anything can be executed with zero logical threads of control,
even if there are no elements. (It usually takes some execution to determine
that there are zero elements.)

Perhaps your mental model is that the actual number of logical threads of
control is the number of chunks plus one for loop control? I don't see how that
could be derived from the wording as given; if that's your intent, there needs
to be at least an AARM note to that effect.

****************************************************************

From: Tucker Taft
Sent: Tuesday, November 20, 2018  1:54 PM

> The number of chunks is also the number of logical threads of control.
> And I can't see how anything can be executed with zero logical threads
> of control, even if there are no elements. (It usually takes some
> execution to determine that there are zero elements.)

I think that was probably why the original AI disallowed ever having a maximum
of zero.  But I think the obvious thing would be that if the maximum number of
chunks is <= 1, then you don't bother spawning any new threads.

> Perhaps your mental model is that the actual number of logical threads
> of control is the number of chunks plus one for loop control? I don't
> see how that could be derived from the wording as given; if that's
> your intent, there needs to be at least an AARM note to that effect.

I certainly don't feel that strongly about allowing a maximum of zero.  On the
other hand, you already have a thread which is elaborating the chunk
specification, so there is nothing preventing that thread from at least noticing
that the maximum is zero, and performing the check that in fact there are no
iterations to perform.  So I don't see it as a big deal either way, and I don't
really see the need for an AARM note, since the check that there is in fact
nothing to do is reasonably done before we start spawning any additional
threads.

****************************************************************

From: Randy Brukardt
Sent: Wednesday, November 21, 2018  12:00 AM

If that's the logic (and that's fine), then there doesn't seem to be any reason
to disallow a maximum of zero (it just would mean that the original thread
executes the loop, if any execution is needed). I'd expect that 1 chunk would
mean the same thing (why spawn one thread and have the original thread sitting
around waiting for it?), but at least logically it would be different. Certainly
we don't want a negative maximum, so we need some sort of check in any case.

The idea would be that the number is the maximum of *extra* logical threads
(some of which might get mapped to the existing thread that is elaborating the
loop). If that's the case, then zero makes sense -- not a lot, as it essentially
would cancel the "parallel" keyword of the loop (of course, any Legality checks
would remain), but it wouldn't be nonsense.

Anyway, if we want to change this to between these consistent (and I do, one way
or the other), then we need some proposed revised wording to put into a clean-up
AI. A suggestion?

****************************************************************

From: Tucker Taft
Sent: Wednesday, November 21, 2018  10:26 AM

On further thought, I don't really think it works to have a zero maximum in the
case where there is a chunk parameter if the number of iterations is non-zero,
because what value would the chunk parameter have inside the loop?  We could
allow zero chunks when there is no specified chunk parameter, or when a check
indicates there are no iterations, but allowing zero chunks in general and just
saying the loop should be treated as a sequential loop doesn't work in the
presence of a chunk parameter.

I am just about convinced to go back to having a minimum of one chunk determined
by the chunk specification, and let the programmer use Integer'Min or equivalent
to prevent zero showing up.  I also still think it is fine for Split to return
fewer chunks than the maximum, including zero chunks.  I don't see the
contradiction.  I would say the "Maximum_Split" parameter (or whatever it is now
called) should be required to be greater than zero for consistency with this AI,
but it seems fine if Chunk_Count returns zero when there are zero elements in
the container.

****************************************************************

From: Randy Brukardt
Sent: Wednesday, November 21, 2018  2:59 PM

> ... I also still think it is fine for Split to return fewer chunks
> than the maximum, including zero chunks.
> I don't see the contradiction.  I would say the "Maximum_Split"
> parameter (or whatever it is now called) should be required to be
> greater than zero for consistency with this AI, but it seems fine if
> Chunk_Count returns zero when there are zero elements in the
> container.

That only makes sense to me if we ban empty chunks otherwise. With the anything
goes scheme you are proposing, you have the overhead to check for empty loops
*and* overhead to check for zero chunks. That seems silly. (In almost all cases,
null ranges and the like require dedicated code at the machine level.)

My understanding of Brad's original proposal is that empty chunks were allowed
but there had to be at least one chunk. That makes the most sense to me, since
for some data structures it might be expensive to guarantee that elements exist
in a particular range (think graphs or trees). In that case, allowing no chunks
is just extra overhead, as that has to be checked for explicitly before
starting, and you're doing a similar check anyway for each chunk.

OTOH, if you were to not allow empty chunks, then it makes sense for an empty
container to return no chunks. In that case, the overhead of checking for empty
moves to the chunk part of the code (and not in the individual chunk loops). I
think this is not as good of a model because it would require expensive complete
walks of some data structures -- but it's not a critical issue as I don't think
it matters for any of the existing containers. (It matters mainly for
hypothetical user-defined containers, clearly a less important problem.)

Requiring both is madness -- there should only be one way to present an empty
container to the parallel iteration. But that seems to be what you have
suggested.

****************************************************************

From: Tucker Taft
Sent: Wednesday, November 21, 2018  4:19 PM

> ...
> Requiring both is madness -- there should only be one way to present
> an empty container to the parallel iteration. But that seems to be what
> you have suggested.

I guess I don't see it that way.  Both seem reasonable.  One or more empty
chunks are fine, even if there are elements in other chunks, and it is fine to
have no chunks at all when there are no elements.  They seem orthogonal.

Obviously you feel differently.  Perhaps a straw vote?

****************************************************************

From: Randy Brukardt
Sent: Wednesday, November 21, 2018  6:44 PM

> I guess I don't see it that way.  Both seem reasonable.  One or more
> empty chunks are fine, even if there are elements in other chunks, and
> it is fine to have no chunks at all when there are no elements.  They
> seem orthogonal.

"Parallel" exists to improve performance. Adding extra code (and thus execution time) into the iteration scheme just because you feel it is "fine"
for the items to be empty doesn't make any sense to me. Either one or the other should be required to be non-empty so the extra overhead of the test that is inevitably required for dealing with the empty case is eliminated.
That does seem to argue that the chunks should be non-empty (as that overhead is
repeated for each chunk, while the other test only occurs once), but I don't
care a lot which is chosen.

If we were just defining an interface without any performance implications, then
I'd probably agree with you (more flexible is better). But that's not the case
here. This is the same reason that I got Brad to invert the interface so that
the iterator queries for each chunk rather than trying to return them all at
once -- we have to wring out every bit of avoidable overhead. And this is
overhead that can be avoided.

> Obviously you feel differently.  Perhaps a straw vote?

I don't think we can effectively conduct a straw poll here, as only a few people
will have followed the conversation enough to have an informed opinion. Most
likely, Brad will chose some approach and then you or I or both can argue at the
upcoming meeting for a different approach. It's only a couple of weeks from now
anyway.

****************************************************************

From: Brad Moore
Sent: Thursday, December 6, 2018  12:22 AM

I have a somewhat different perspective to offer here, as it appears that apples
and oranges are being compared, or rather, Apples and Apple slices.

I think the semantics of the chunk_specification syntax in AI12-0251-1 is about
specifying the number of logical threads of control being applied to a loop,
whereas the Split function in AI12-0266-1 is about splitting a container into
separate chunks of elements. The Advised_Split parameter to the Split call
logically makes sense to me as the recommended number of splits to be applied to
the container.

So if Advised_Split is specified as 0, then to me, it makes sense to think of
that as zero splits are applied, ie one chunk. (Which could be an empty chunk if
there are no elements in the container or iterator to be iterated.

From this, an Advised_Split of 1 would mean split the container once, (i.e two
chunks), and an Advised_Split of 2 would mean the the container twice, (i.e.
three chunks) and so on.

Thus, if N is the number of logical threads specified by the chunk_specification
in a parallel loop, it would equate to N - 1 being passed as the Advised_Split
parameter in a call to Split.

This feels like it would be simpler to explain and understand.

Otherwise if feels messy and kludgy to me to try to rationalize that 0 and 1 for
Advised_Split mean the same thing.

****************************************************************

From: Jean-Pierre Rosen
Sent: Thursday, December 6, 2018  1:58 AM

> So if Advised_Split is specified as 0, then to me, it makes sense to
> think of that as zero splits are applied, ie one chunk. (Which could
> be an empty chunk if there are no elements in the container or iterator to be iterated.
>
>>From this, an Advised_Split of 1 would mean split the container once,
>>(i.e two chunks),
> and an Advised_Split of 2 would mean the the container twice, (i.e. three
> chunks) and so on.

This looks to me as opening the door to endless discussions and explanations to
users; although I'm one of these who hardly followed the discussions, it seems
to me that if the name is that ambiguous, it should be changed: call it
Advised_Chunks, there will be no discussion.

****************************************************************

From: Tucker Taft
Sent: Thursday, December 6, 2018  7:17 AM

I prefer Max_Chunks.  "Advised" is a weird term in this context, and implies
that it is a "hint" rather than a maximum.

****************************************************************

From: Randy Brukardt
Sent: Thursday, December 6, 2018  4:54 PM

That's not the question. (At least, it wasn't *my* question.) AI12-0251-1 makes
it clear that the maximum number of chunks cannot be zero, ergo 0 cannot be
passed to Max_Chunks (or whatever name for the parameter was decided upon in
Lexington). If one tries to set the maximum number of chunks to 0, Program_Error
is raised.

My concern is Split *returning* zero chunks. I see absolutely no reason to allow
that, given that we are allowing chunks to be empty. If we do allow returning
zero chunks, the loop code would necessarily need a special case to deal with
that (just like dealing with null slices and empty iterations burdens the
generated code). Typically, in Ada we have no choice about allowing zero of
something, but in this case, we do. We have to allow *something* to be empty,
but there's no need to allow *everything* to be empty. It doesn't make sense for
a loop to execute with zero threads, regardless of whether that loop does
anything. Thus, I don't believe that Split should be allowed to return zero
chunks (nor should the maximum to allowed to be zero).

****************************************************************

From: Tucker Taft
Sent: Thursday, December 6, 2018  8:27 PM

Last time we went around on this I came to agree that we should at least
disallow 0 for the "determined maximum number of chunks" in the latest update of
the AI12-0251-1 AI.

So hopefully this means we never have to allow zero for the "Max_Chunks"
parameter (or whatever we end up calling it) to Split.

As far as Split, I am fine if others believe it is better to disallow Split from
returning zero chunks.  It does seem marginally more efficient at the point of
use to not have to worry about this special case.

****************************************************************

From: Richard Wai
Sent: Wednesday, November 21, 2018  12:10 PM

I've been trying to get up to speed on the various AIs regarding parallel
iteration, so please forgive me if I missed something!

I generally take a more application-specific view on Ada. Direct use of the
generalized iteration and indexing facilities by the programmer can be extremely
useful in a lot of cases. We have actually used these quite a bit in our work.
So with that perspective, I'm pretty interested in continuity/clarity in
implementing parallelized iteration.

It's probably safe to say that Chunk_Finish is related to Has_Element. Yet it's
name suggest a disconnect, which could be confusing, especially for direct
programmer application. Where one iterates until Has_Element returns False*, it
could seem that one iterates on a Chunk until Chunk_Finished returns True.

* I note here that the original AI does suggest that Chunk_Finished returns
False at the end of a Chunk, and I don't see any specific amendment that changes
this. This is obviously confusing, as one would expect Chunk_Finish to return
True when the Chunk is finished, though I recognise that we iterate until
Has_Element returns False, so that would be consistent.

So here is my humble suggestion:

Change Chunk_Finished to Chunk_Has_Element, and specify that it returns True if
and only if Position is a member of the chunk indexed by Chunk.

This also makes implementation of Chunk_Index a bit easier, as it can simply
specify a first/last Cursor, which Chink_Has_Element can check against easily
enough.

-- As a further suggestion, in this case, I think Start_Of_Chunk should then
also be renamed to Chunk_First, for consistency.

****************************************************************

From: Tucker Taft
Sent: Wednesday, November 21, 2018  12:21 PM

Good suggestions, Richard!

****************************************************************

From: Richard Wai
Sent: Wednesday, November 21, 2018  12:52 PM

If only I had run my email through the Ada compiler first, it would have found
all of my typos!

: "Chunk_Finish" is undefined
: "Chink_Has_Element" is undefined

****************************************************************

From: Randy Brukardt
Sent: Wednesday, November 21, 2018  2:46 PM

> Good suggestions, Richard!

I concur, but a caution: many AIs were extensively changed by the discussions at
the October ARG meeting, but neither the minutes nor the updated AIs (in the
case where they were approved) are available yet. (I'm working on those as hard
as I can with the holiday week.) So you're necessarily looking at obsolete
versions of the AIs.

In this particular case, we changed many of the names of the routines, as well
as many of the parameters, but not the specific one you are concerned about.

****************************************************************

From: Brad Moore
Sent: Wednesday, January 9, 2019  3:40 PM

I was looking at my test program for the Parallel Iterator proposal, and it was
bothering me that the implementation for the parallel loop was issuing two calls
to the interface for every iteration.

         Process_Loop :
         while not Chunk_Finished (...) loop

            Loop_Body;

            Position := Iterator.Next (Position);
         end loop Process_Loop;

I am viewing subprogram calls into an interface as being expensive.
I believe this can be reduced to a single call.

So I am proposing modifying the Start_Of_Chunk call to have a Boolean out
parameter, End_Of_Chunk,

And then replacing the Chunk_Finished call with an override of Next that take a
Chunk_Index as an input and has an out parameter, End_Of_Chunk.

as in;

   function Start_Of_Chunk
     (Object       : Parallel_Iterator;
      Chunk        : Chunk_Index;
      End_Of_Chunk : out Boolean) return Cursor is abstract
     with Pre'Class => Object.Is_Split and then Chunk <= Object.Chunk_Count;

   function Next
     (Object       : Parallel_Iterator;
      Position     : Cursor;
      Chunk        : Chunk_Index;
      End_Of_Chunk : out Boolean) return Cursor is abstract
     with Pre'Class => Object.Is_Split and then Chunk <= Object.Chunk_Count;


Then the implementation loop would look something like;

      procedure Chunk_Code
        (Chunk : My_Interface.Chunk_Index)
      is
         End_Of_Chunk : Boolean := False;

         Position : Employee_Lists.Cursor :=
           Iterator.Start_Of_Chunk (Chunk,
                                    End_Of_Chunk);
      begin --

         while not End_Of_Chunk loop

            -- Loop_Body

            Position := Iterator.Next (Position, Chunk, End_Of_Chunk);
         end loop Process_Loop;

         --  Finalize
      end Chunk_Code;

I believe this would potentially improve the execution time of the loop, which
is the main point for parallel iterators, so unless I hear otherwise, I am
proceeding with this change to the Parallel Iterator interface in the writeup of
the AI.

Comments?

****************************************************************

From: Randy Brukardt
Sent: Wednesday, January 9, 2019  3:58 PM

> I am viewing subprogram calls into an interface as being expensive.
> I believe this can be reduced to a single call.

I suppose, but that would be different than the usual iterators. It seems best
for everyone if these work as similarly as possible. I note that this particular
interface can be implemented rather like an abstract type, at least in the usual
use cases, so dispatching doesn't have to be as expensive as it is in cases of
multiple inheritance.

> So I am proposing modifying the Start_Of_Chunk call to have a Boolean
> out parameter, End_Of_Chunk,
>
> And then replacing the Chunk_Finished call with an override of Next
> that take a Chunk_Index as an input and has an out parameter,
> End_Of_Chunk.

Huh? You can't "override" a routine and change the profile. That's called
"overloading". And Ada's rules would require that you still implement the older
(and now unused) Next, which doesn't seem great for usability.

Seriously, Brad, we're far beyond the point where you ought to be tinkering with
the interface itself. We discussed the interface and asked for a bunch of name
and subtype changes, not a wholesale overhaul. (And then we requested a few
addition changes in e-mail threads last month.) We can never finish these things
if you redesign them every time you look at them - we've already agreed to a
design.

****************************************************************

From: Brad Moore
Sent: Wednesday, January 9, 2019  4:28 PM

>> I am viewing subprogram calls into an interface as being expensive.
>> I believe this can be reduced to a single call.
>
> I suppose, but that would be different than the usual iterators. It
> seems best for everyone if these work as similarly as possible. I note
> that this particular interface can be implemented rather like an
> abstract type, at least in the usual use cases, so dispatching doesn't
> have to be as expensive as it is in cases of multiple inheritance.

Its not necessarily the dispatching that I am concerned about. It's been my
experience that subprogram calls in general are expensive. Adding an extra call
for every iteration seems a lot worse than the concern you had about checking to
see at the beginning of the loop whether there were zero iterations, which you
were quite adamant about eliminating the overhead for.

And using your argument, if performance weren't the main point of this feature,
then I would agree with you, but since we want to in theory squeeze out whatever
performance we can, it seems we should be eliminating this overhead, since it is
possible to do so.

>> So I am proposing modifying the Start_Of_Chunk call to have a Boolean
>> out parameter, End_Of_Chunk,
>>
>> And then replacing the Chunk_Finished call with an override of Next
>> that take a Chunk_Index as an input and has an out parameter,
>> End_Of_Chunk.
>
> Huh? You can't "override" a routine and change the profile. That's
> called "overloading". And Ada's rules would require that you still
> implement the older (and now unused) Next, which doesn't seem great for usability.

Sorry, overload is what I meant.

And actually, one of Tuckers comments was to rename Start_Of_Chunk to
First_In_Chunk, and then have a subprogram called Next_In_Chunk, rather than
Chunk_Finished in the name of making the usage look more similar to other
interfaces.

So there actually would not any overload, and having the Next_In_Chunk is
actually addressing a comment in the minutes from our last discussion of this
AI. Plus, I think the First_In_Chunk, Next_In_Chunk calls end up looking more
similar to other iterator interfaces this way.

Plus it would not make sense to have a call Next_In_Chunk at the top of the
loop. It needs to be at the end of the loop to make sense.

So, I really think the changes I am proposing fall in scope for what was asked
in the last discussion.

****************************************************************

From: Brad Moore
Sent: Wednesday, January 9, 2019  4:43 PM

Also, in the name of addressing the comments from the Lexington meeting, I would
eliminate the End_Of_Chunk boolean output parameters, and instead return a NULL
Cursor if we are at the end of the chunk. Which makes the interface look even
more similar to the other iterator interfaces.

****************************************************************

From: Brad Moore
Sent: Wednesday, January 9, 2019  4:56 PM

Note that by returning a null cursor, though, we'd be requiring a call to
Has_Element at the top of the loop

eg

while Has_Element (Position) loop

So we are back to having two calls per iteration.

I suppose a compile could inline that call though. Right?

In which case, I think case, I think my concerns would be addressed, as well as
addressing comments raised against the AI at the last discussion of the AI.

****************************************************************

From: Tucker Taft
Sent: Wednesday, January 9, 2019  4:02 PM

By all means please make this look as much as possible like a normal iterator.

****************************************************************

From: Randy Brukardt
Sent: Wednesday, January 9, 2019  5:22 PM

Agreed. To the extent that some other interface could have provided better
performance, that's a mistake that was made when the original iterators were
designed and it's rather too late to change that now. (As I recall, the intent
was that the iterator interface would operate similarly to the way one would
write an iterator explicitly using the operations of the containers. Originally,
the iterators shared the implementation with the containers, but since that
didn't work for multiple iterations on the same object, we abandoned that model
but never really rethought the interface.)

****************************************************************

From: Brad Moore
Sent: Wednesday, January 9, 2019  11:36 PM

Attached is my homework for AI12-0266-01, Parallel Container Iterators.
[This is version /07 of the AI - Editor.]

The summary of the main changes are;

- Changed Nonblocking of everything in the Iterator_Interfaces to depend on the
  Nonblocking of the generic formal parameters
- Removed the Iterations call from the Parallel interfaces
- Added Is_Split to use as precondition for the parallel iterator calls, and
  raises Program_Error if Split_Into_Chunks is called more than once, or if the
  other calls are made without having called Split_Into_Chunks
- Renamed parameter Maximum_Split of Split_Into_Chunks to Max_Chunks
   A number of splits is confusing. If number of splits is 1, does that mean 1
   or 2 chunks. Renaming to Max_Chunks eliminates the confusion.
- Renamed Start_Of_Chunk to First_In_Chunk
- Replaced Chunk_Finished call with Last_In_Chunk so the iteration looks much
  more similar to the existing iterator interfaces
- Added a Last_In_Chunk and Previous_In_Chunk call for
  Parallel_Reversible_Iterator_Interfaces

****************************************************************


Questions? Ask the ACAA Technical Agent