!standard 5.5.1(4/3) 19-01-10 AI12-0266-1/08 !standard 5.5.1(6/4) !standard 5.5.1(11/3) !standard 5.5.2(4/3) !class Amendment 18-03-28 !status work item 18-03-28 !status received 18-03-28 !priority Medium !difficulty Medium !subject Parallel Container Iterators !summary Parallel iterators are defined for containers. !problem AI12-0119 provides mechanisms that can be used to iterate in parallel over the elements of an array. One would expect that there would be a similar need to be able to iterate over the elements of a container in parallel. Should such a capability be provided in Ada? (Yes). !proposal This proposal depends on the facilities for parallel loops (AI12-0119-1), chunk specifications (AI12-0251-1), and for aspect Nonblocking (AI12-0064-2). The goals of this proposal are: - To extend parallel iteration to the standard container libraries and user defined containers; - To permit existing programs to benefit from parallelism through minimal restructuring of sequential programs into ones that permit more parallel execution; - To support the development of complete programs that maximize parallelism; - To efficiently guide the compiler in the creation of effective parallelism with minimal input from the programmer; - To avoid syntax that could generate erroneous behaviour or produce noticeably different results if executed sequentially vs in parallel. This proposal extends the interfaces in Ada.Containers.Iterator_Interfaces to include two new interfaces, a parallel iterator interface, and a reversible parallel iterator interface. Such iterators derived from these interfaces can be used for parallel iteration, but can also be used for sequential iteration for loops that do not have the parallel reserved keyword. A parallel iterator provides a Split_Into_Chunks procedure that can be used to associate with the iterator a set of cursors that define the chunk boundaries for parallel iteration. Each cursor can be assigned to a different logical thread of control and identifies the first element of a different chunk of iterations where each chunk can be iterated through in parallel with other chunks to collectively cover all the elements of the container. The starting cursor for each chunk of iteration can be obtained by calling the First_In_Chunk operation, passing in the chunk index of the split as a parameter to the call. The last cursor for each chunk of iteration can be obtained by calling the Last_In_Chunk operation, passing in the chunk index of the split as a parameter to the call. Similarly, the Next_In_Chunk and Previous_In_Chunk calls return the next or previous cursor in the chunk, and return a null cursor if the cursor is advanced past the last or before the first cursor of a chunk, respectively. In addition, the loop syntax is extended to allow the loops with iterator specifications to have the parallel keyword. !wording Replace 5.5(3/5) with: iteration_scheme ::= while condition | for loop_parameter_specification | for iterator_specification | parallel [(chunk_specification)] for loop_parameter_specification | parallel [(chunk_specification)] for iterator_specification Add after 5.5.1(4/3): type Parallel_Iterator is limited interface and Forward_Iterator with Nonblocking => Cursor'Nonblocking and Has_Element'Nonblocking; subtype Chunk_Index is Positive; function Is_Split (Object : Parallel_Iterator) return Boolean is abstract with Nonblocking => Parallel_Iterator'Nonblocking; procedure Split_Into_Chunks (Object : in out Parallel_Iterator; Max_Chunks : Chunk_Index) is abstract with Nonblocking => Parallel_Iterator'Nonblocking, Pre'Class => not Object.Is_Split or else raise Program_Error, Post'Class => Object.Is_Split and then Object.Chunk_Count <= Max_Chunks; function Chunk_Count (Object : Parallel_Iterator) return Chunk_Index is abstract with Nonblocking => Parallel_Iterator'Nonblocking, Pre'Class => Object.Is_Split or else raise Program_Error; function First_In_Chunk (Object : Parallel_Iterator; Chunk : Chunk_Index) return Cursor is abstract with Nonblocking => Parallel_Iterator'Nonblocking, Pre'Class => (Object.Is_Split and then Chunk <= Object.Chunk_Count) or else raise Program_Error; function Next_In_Chunk (Object : Parallel_Iterator; Position : Cursor; Chunk : Chunk_Index) return Cursor is abstract with Nonblocking => Parallel_Iterator'Nonblocking, Pre'Class => (Object.Is_Split and then Chunk <= Object.Chunk_Count) or else raise Program_Error; overriding function First Object : Parallel_Iterator) return Cursor is abstract with Nonblocking => Parallel_Iterator'Nonblocking; overriding function Next (Object : Parallel_Iterator; Position : Cursor) return Cursor is abstract with Nonblocking => Parallel_Iterator'Nonblocking; [Editor's note: We need the above overridings to make these routines Nonblocking, else they would allow blocking as inherited from their ancestors.] type Parallel_Reversible_Iterator is limited interface and Parallel_Iterator and Reversible_Iterator with Nonblocking => Parallel_Iterator'Nonblocking; function Last_In_Chunk (Object : Parallel_Reversible_Iterator; Chunk : Chunk_Index) return Cursor is abstract with Nonblocking => Parallel_Reversible_Iterator'Nonblocking, Pre'Class => (Object.Is_Split and then Chunk <= Object.Chunk_Count) or else raise Program_Error; function Previous_In_Chunk (Object : Parallel_Reversible_Iterator; Position : Cursor; Chunk : Chunk_Index) return Cursor is abstract with Nonblocking => Parallel_Reversible_Iterator'Nonblocking, Pre'Class => (Object.Is_Split and then Chunk <= Object.Chunk_Count) or else raise Program_Error; overriding function Last (Object : Parallel_Reversible_Iterator) return Cursor is abstract with Nonblocking => Parallel_Reversible_Iterator'Nonblocking; overriding function Previous (Object : Parallel_Reversible_Iterator; Position : Cursor) return Cursor is abstract with Nonblocking => Parallel_Reversible_Iterator'Nonblocking; Modify 5.5.1(6/3): An iterator type is a type descended from the Forward_Iterator interface from some instance of Ada.Iterator_Interfaces. A reversible iterator type is a type descended from the Reversible_Iterator interface from some instance of Ada.Iterator_Interfaces. {A parallel iterator type is a type descended from the Parallel_Iterator interface from some instance of Ada.Iterator_Interfaces. A parallel reversible iterator type is a type descended from the Parallel_Reversible_Iterator interface from some instance of Ada.Iterator_Interfaces.} An iterator object is an object of an iterator type. A reversible iterator object is an object of a reversible iterator type. {A parallel iterator object is an object of a parallel iterator type. A parallel reversible iterator object is an object of a parallel reversible iterator type.} The formal subtype Cursor from the associated instance of Ada.Iterator_Interfaces is the iteration cursor subtype for the iterator type. Modify 5.5.1(11/3): An iterable container type is an indexable container type with specified Default_Iterator and Iterator_Element aspects. A reversible iterable container type is an iterable container type with the default iterator type being a reversible iterator type. {A parallel iterable container type is an iterable container type with the default iterator type being a parallel iterator type. A parallel reversible iterable container type is a reversible iterable container type with the default iterator type being a parallel reversible iterator type.} An iterable container object is an object of an iterable container type. A reversible iterable container object is an object of a reversible iterable container type. {A parallel iterable container object is an object of a parallel iterable container type. A parallel reversible iterable container object is an object of a parallel reversible iterable container type.} Modify 5.5.2(4/3): If the reserved word reverse appears, the iterator_specification is a reverse iterator{.}[;] {If the reserved word parallel appears, the iterator_specification is a parallel iterator;} otherwise it is a forward iterator. In a reverse generalized iterator, the iterator_name shall be of a reversible iterator type. {In a parallel generalized iterator, the iterator_name shall be of a parallel iterator type or a parallel reversible iterator type.} In a reverse container element iterator, the default iterator type for the type of the iterable_name shall be a reversible iterator type. {In a parallel container element iterator, the default iterator type for the type of the iterable_name shall be of a parallel iterator type or a parallel reversible iterator type.} Modify AARM 5.5.2(8.a/5): Ramification: The loop parameter{s} of a generalized iterator {have}[has] the same accessibility as the loop statement. This means that the loop parameter object{s} {are}[is] finalized when the loop statement is left. ([It]{They} also may be finalized as part of assigning a new value to the loop parameter{s}.) For array component iterators, the loop parameter{s each} directly denotes an element of the array and [has]{have} the accessibility of the associated array. For container element iterators, the loop parameter{s each} denotes the result of [the]{an} indexing function call (in the case of a constant indexing) or a generalized reference thereof (in the case of a variable indexing). Roughly speaking, the loop parameter{s have} [has] the accessibility level of a single iteration of the loop. More precisely, the function result (or the generalized reference thereof) is considered to be renamed in the declarative part of a notional block statement which immediately encloses the loop's sequence_of_statements; the accessibility of the loop parameter{s are} [is] that of the block statement. Modify 5.5.2(10/3): For a generalized iterator, the loop parameter{s are} [is] created, the iterator_name is evaluated, and the denoted iterator object becomes the loop iterator. In a forward generalized iterator, the operation First of the iterator type is called on the loop iterator, to produce the initial value for the loop parameter. If the result of calling Has_Element on the initial value is False, then the execution of the loop_statement is complete. Otherwise, the sequence_of_statements is executed and then the Next operation of the iterator type is called with the loop iterator and the current value of the loop parameter to produce the next value to be assigned to the loop parameter. This repeats until the result of calling Has_Element on the loop parameter is False, or the loop is left as a consequence of a transfer of control. For a reverse generalized iterator, the operations Last and Previous are called rather than First and Next. Add after 5.5.2(10/3): For a parallel generalized iterator, the operation Split_Into_Chunks of the iterator type is then called, with the Max_Chunks parameter specified as the upper bound for the number of loop parameter objects (and the number of logical threads of control) to be associated with the iterator. The number of loop parameters that were created is determined by calling the Chunk_Count operation of the iterator. Each loop parameter can be assigned to a different logical thread of control, with an ordinal chunk index value. The Initial cursor values to be associated with each loop parameter is determined by calling the First_In_Chunk operation of the iterator using the chunk index value of the loop parameter object as the Chunk parameter for the call. The sequence_of_statements is executed for each loop parameter object and the Next_In_Chunk operation of the iterator type is called with the chunk index value of the loop parameter object and the current value of the loop parameter to produce the next value to be assigned to a given loop parameter. This repeats until the result of calling Has_Element on the loop parameter is False, or the loop is left as a consequence of a transfer of control. For a reverse parallel generalized iterator, the operations Last_In_Chunk and Previous_In_Chunk are called rather than First_In_Chunk and Next_In_Chunk. AARM Note The number of loop parameters specified for the Max_Chunks parameter of the Split_Into_Chunks call is only a recommendation by the implementation. A container implementation may choose a lower value if a more optimal split can be determined. For instance, a tree based container might create the split based on the number of branches at the top levels of the tree. Modify AARM 5.5.2(10.a/4): Ramification: The loop parameter{s} of a generalized iterator [is a]{are} variable{s} of which the user only has a constant view. It follows the normal rules for a variable of its nominal subtype. In particular, if the nominal subtype is indefinite, the variable is constrained by its initial value. Similarly, if the nominal subtype is class-wide, the variable (like all variables) has the tag of the initial value. Constraint_Error may be raised by a subsequent iteration if Next [or]{,} Previous{, Next_In_Chunk, or Previous_In_Chunk} return an object with a different tag or constraint. Modify 5.5.2(11/3): For an array component iterator, the iterable_name is evaluated and the denoted array object becomes the array for the loop. If the array for the loop is a null array, then the execution of the loop_statement is complete. Otherwise, the sequence_of_statements is executed with [the] loop parameter{s that collectively denote} [denoting] each component of the array for the loop{. If the iterator is not a parallel array component iterator, the loop is iterated} using a canonical order of components, which is last dimension varying fastest (unless the array has convention Fortran, in which case it is first dimension varying fastest). For a forward array component iterator, the iteration starts with the component whose index values are each the first in their index range, and continues in the canonical order. For a reverse array component iterator, the iteration starts with the component whose index values are each the last in their index range, and continues in the reverse of the canonical order. {For a parallel array component iterator, the order of iteration is arbitrary.} The loop iteration proceeds until the sequence_of_statements has been executed for each component of the array for the loop, or until the loop is left as a consequence of a transfer of control. Modify 5.5.2(12/3): For a container element iterator, the iterable_name is evaluated and the denoted iterable container object becomes the iterable container object for the loop. The default iterator function for the type of the iterable container object for the loop is called on the iterable container object and the result is the loop iterator. [An object] {Objects} of the default cursor subtype [is]{are} created (the loop cursor{s}). Modify 5.5.2(13/3): [into two paragraphs] For a forward container element iterator, the operation First of the iterator type is called on the loop iterator, to produce the initial value for the loop cursor. If the result of calling Has_Element on the initial value is False, then the execution of the loop_statement is complete. Otherwise, the sequence_of_statements is executed with the loop parameter denoting an indexing (see 4.1.6) into the iterable container object for the loop, with the only parameter to the indexing being the current value of the loop cursor; then the Next operation of the iterator type is called with the loop iterator and the loop cursor to produce the next value to be assigned to the loop cursor. This repeats until the result of calling Has_Element on the loop cursor is False, or until the loop is left as a consequence of a transfer of control. For a reverse container element iterator, the operations Last and Previous are called rather than First and Next.{ For a parallel container element iterator, the operation Split_Into_Chunks of the iterator type is called, with the Max_Chunks parameter specified as the upper bound for the number of loop parameter objects to be associated with the iterator. The number of loop parameters that were created is determined by calling the Chunk_Count operation of the iterator. Each loop parameter can be assigned to a different logical thread of control, with an ordinal chunk index value. The Initial cursor values to be associated with each loop parameter is determined by calling the First_In_Chunk operation of the iterator using the chunk index value of the loop parameter object as the Chunk parameter for the call. The sequence_of_statements is executed for each loop parameter object and the Next_In_Chunk operation of the iterator type is called with the chunk index value of the loop parameter object and the current value of the loop parameter to produce the next value to be assigned to a given loop parameter. This repeats until the result of calling Has_Element on the loop parameter is False, or the loop is left as a consequence of a transfer of control.} If the loop parameter is a constant (see above), then the indexing uses the default constant indexing function for the type of the iterable container object for the loop; otherwise it uses the default variable indexing function. Modify 5.5.2(15/3): -- Array component iterator example: {parallel} for Element of Board loop -- See 3.6.1. Element := Element * 2.0; -- Double each element of Board, a two-dimensional array. end loop; Modify A.18.2(74.1/3): function Iterate (Container : in Vector) return Vector_Iterator_Interfaces.{Parallel_}Reversible_Iterator'Class; Modify A.18.2(74.2/3): function Iterate (Container : in Vector; Start : in Cursor) return Vector_Iterator_Interfaces.{Parallel_}Reversible_Iterator'Class; Modify A.18.2(230.1/3): function Iterate (Container : in Vector) return Vector_Iterator_Interfaces.{Parallel_}Reversible_Iterator'Class; Modify A.18.2(230.2/3): Iterate returns a {parallel} reversible iterator object (see 5.5.1) that will generate a value for [a] loop parameter{s} (see 5.5.2) designating each node in Container, starting with the first node and moving the cursor as per the Next function when used as a forward iterator, and starting with the last node and moving the cursor as per the Previous function when used as a reverse iterator {, and starting with all nodes simultaneously using the Split function to generate cursors for all the iterations of the loop when used as a parallel iterator}. Tampering with the cursors of Container is prohibited while the iterator object exists (in particular, in the sequence_of_statements of the loop_statement whose iterator_specification denotes this object). The iterator object needs finalization. Modify A.18.2(230.3/3): function Iterate (Container : in Vector; Start : in Cursor) return Vector_Iterator_Interfaces.{Parallel_}Reversible_Iterator'Class; Modify A.18.2(230.4/3): If Start is not No_Element and does not designate an item in Container, then Program_Error is propagated. If Start is No_Element, then Constraint_Error is propagated. Otherwise, Iterate returns a {parallel }reversible iterator object (see 5.5.1) that will generate a value for [a] loop parameter{s} (see 5.5.2) designating each node in Container, starting with the node designated by Start and moving the cursor as per the Next function when used as a forward iterator, or moving the cursor as per the Previous function when used as a reverse iterator {, or all nodes simultaneously using the Split function when used as a parallel iterator}. Tampering with the cursors of Container is prohibited while the iterator object exists (in particular, in the sequence_of_statements of the loop_statement whose iterator_specification denotes this object). The iterator object needs finalization. Modify A.18.3(46.1/3): function Iterate (Container : in List) return List_Iterator_Interfaces.{Parallel_}Reversible_Iterator'Class; Modify A.18.3(46.2/3): function Iterate (Container : in List; Start : in Cursor) return List_Iterator_Interfaces.{Parallel_}Reversible_Iterator'Class; Modify A.18.3(144.1/3): function Iterate (Container : in List) return List_Iterator_Interfaces.{Parallel_}Reversible_Iterator'Class; Modify A.18.3(144.2/3): Iterate returns a {parallel }reversible iterator object (see 5.5.1) that will generate a value for [a] loop parameter{s} (see 5.5.2) designating each node in Container, starting with the first node and moving the cursor as per the Next function when used as a forward iterator, and starting with the last node and moving the cursor as per the Previous function when used as a reverse iterator {, and starting with all nodes simultaneously using the Split_Into_Chunks procedure to generate cursors for all the iterations of the loop when used as a parallel iterator}. Tampering with the cursors of Container is prohibited while the iterator object exists (in particular, in the sequence_of_statements of the loop_statement whose iterator_specification denotes this object). The iterator object needs finalization. Modify A.18.3(144.3/3): function Iterate (Container : in List; Start : in Cursor) return List_Iterator_Interfaces.Parallel_Reversible_Iterator'Class; Modify A.18.3(144.4/3): If Start is not No_Element and does not designate an item in Container, then Program_Error is propagated. If Start is No_Element, then Constraint_Error is propagated. Otherwise, Iterate returns a {parallel} reversible iterator object (see 5.5.1) that will generate [a] value{s} for [a] loop parameter{s} (see 5.5.2) designating each node in Container, starting with the node designated by Start and moving the cursor as per the Next function when used as a forward iterator, or moving the cursor as per the Previous function when used as a reverse iterator {or all nodes simultaneously using the Split_Into_Chunks procedure when used as a parallel iterator}. Tampering with the cursors of Container is prohibited while the iterator object exists (in particular, in the sequence_of_statements of the loop_statement whose iterator_specification denotes this object). The iterator object needs finalization. Modify A.18.5(37.1/3): function Iterate (Container : in Map) return Map_Iterator_Interfaces.{Parallel_}[Forward_]Iterator'Class; Modify A.18.5(61.1/3): function Iterate (Container : in Map) return Map_Iterator_Interfaces.{Parallel}[Forward]_Iterator'Class; Modify A.18.5(61.2/3): Iterate returns a[n] {parallel} iterator object (see 5.5.1) that will generate [a] value{s} for [a] loop parameter{s} (see 5.5.2) designating each node in Container, starting with the first node and moving the cursor according to the successor relation {when used as a forward iterator, and starting with all nodes simultaneously using the Split_Into_Chunks procedure to generate cursors for all the iterations of the loop when used as a parallel iterator}. Tampering with the cursors of Container is prohibited while the iterator object exists (in particular, in the sequence_of_statements of the loop_statement whose iterator_specification denotes this object). The iterator object needs finalization. Modify A.18.6(51.1/3): function Iterate (Container : in Map) return Map_Iterator_Interfaces.{Parallel_}Reversible_Iterator'Class; Modify A.18.6(51.2/3): function Iterate (Container : in Map; Start : in Cursor) return Map_Iterator_Interfaces.{Parallel_}Reversible_Iterator'Class; Modify A.18.6(94.1/3): function Iterate (Container : in Map) return Map_Iterator_Interfaces.{Parallel_}Reversible_Iterator'Class; Modify A.18.6(94.2/3): Iterate returns a {parallel} reversible iterator object (see 5.5.1) that will generate [a] value{s} for [a] loop parameter{s} (see 5.5.2) designating each node in Container, starting with the first node and moving the cursor according to the successor relation when used as a forward iterator, and starting with the last node and moving the cursor according to the predecessor relation when used as a reverse iterator {, and starting with all nodes simultaneously using the Split_Into_Chunks procedure to generate cursors for all the iterations of the loop when used as a parallel iterator}. Tampering with the cursors of Container is prohibited while the iterator object exists (in particular, in the sequence_of_statements of the loop_statement whose iterator_specification denotes this object). The iterator object needs finalization. Modify A.18.6(94.3/3): function Iterate (Container : in Map; Start : in Cursor) return Map_Iterator_Interfaces.{Parallel_}Reversible_Iterator'Class; Modify A.18.6(94.4/3): If Start is not No_Element and does not designate an item in Container, then Program_Error is propagated. If Start is No_Element, then Constraint_Error is propagated. Otherwise, Iterate returns a {parallel} reversible iterator object (see 5.5.1) that will generate [a] value{s} for [a] loop parameter{s} (see 5.5.2) designating each node in Container, starting with the node designated by Start and moving the cursor according to the successor relation when used as a forward iterator, or moving the cursor according to the predecessor relation when used as a reverse iterator {, or all nodes simultaneously using the Split_Into_Chunks procedure when used as a parallel iterator}. Tampering with the cursors of Container is prohibited while the iterator object exists (in particular, in the sequence_of_statements of the loop_statement whose iterator_specification denotes this object). The iterator object needs finalization. Modify A.18.8(49.1/3): function Iterate (Container : in Set) return Set_Iterator_Interfaces.{Parallel}[Forward]_Iterator'Class; Modify A.18.8(85.1/3): function Iterate (Container : in Set) return Set_Iterator_Interfaces.{Parallel}[Forward]_Iterator'Class; Modify A.18.8(85.2/3): Iterate returns a[n] {parallel} iterator object (see 5.5.1) that will generate [a] value{s} for [a] loop parameter{s} (see 5.5.2) designating each element in Container, starting with the first element and moving the cursor according to the successor relation {when used as a forward iterator, and starting with all nodes simultaneously using the Split_Into_Chunks procedure to generate cursors for all the iterations of the loop when used as a parallel iterator.} Tampering with the cursors of Container is prohibited while the iterator object exists (in particular, in the sequence_of_statements of the loop_statement whose iterator_specification denotes this object). The iterator object needs finalization. Modify A.18.9(61.1/3): function Iterate (Container : in Set) return Set_Iterator_Interfaces.{Parallel_}Reversible_Iterator'Class; Modify A.18.9(61.2/3): function Iterate (Container : in Set; Start : in Cursor) return Set_Iterator_Interfaces.{Parallel_}Reversible_Iterator'Class; Modify A.18.9(113.1/3): function Iterate (Container : in Set; Start : in Cursor) return Set_Iterator_Interfaces.{Parallel_}Reversible_Iterator'Class; Modify A.18.9(113.2/3): Iterate returns a {parallel} reversible iterator object (see 5.5.1) that will generate [a] value{s} for [a] loop parameter{s} (see 5.5.2) designating each element in Container, starting with the first element and moving the cursor according to the successor relation when used as a forward iterator, and starting with the last element and moving the cursor according to the predecessor relation when used as a reverse iterator {, and starting with all nodes simultaneously using the Split_Into_Chunks procedure to generate cursors for all the iterations of the loop when used as a parallel iterator}. Tampering with the cursors of Container is prohibited while the iterator object exists (in particular, in the sequence_of_statements of the loop_statement whose iterator_specification denotes this object). The iterator object needs finalization. Modify A.18.9(113.3/3): function Iterate (Container : in Set; Start : in Cursor) return Set_Iterator_Interfaces.{Parallel_}Reversible_Iterator'Class; Modify A.18.9(113.4/3): If Start is not No_Element and does not designate an item in Container, then Program_Error is propagated. If Start is No_Element, then Constraint_Error is propagated. Otherwise, Iterate returns a {parallel} reversible iterator object (see 5.5.1) that will generate [a] value{s} for [a] loop parameter{s} (see 5.5.2) designating each element in Container, starting with the element designated by Start and moving the cursor according to the successor relation when used as a forward iterator, or moving the cursor according to the predecessor relation when used as a reverse iterator {, or all nodes simultaneously using the Split_Into_Chunks procedure when used as a parallel iterator}. Tampering with the cursors of Container is prohibited while the iterator object exists (in particular, in the sequence_of_statements of the loop_statement whose iterator_specification denotes this object). The iterator object needs finalization. Modify A.18.10(44/3): function Iterate (Container : in Tree) return Tree_Iterator_Interfaces.{Parallel}[Forward]_Iterator'Class; Modify A.18.10(45/3): function Iterate_Subtree (Position : in Cursor) return Tree_Iterator_Interfaces.{Parallel}[Forward]_Iterator'Class; Modify A.18.10(156/3): function Iterate (Container : in Tree) return Tree_Iterator_Interfaces.{Parallel}[Forward]_Iterator'Class; Modify A.18.10(157/4) Iterate returns a[n] {parallel} iterator object (see 5.5.1) that will generate [a] value{s} for [a] loop parameter{s} (see 5.5.2) designating each element node in Container, starting from with the root node and proceeding in a depth-first order {when used as a forward_iterator, and starting with all nodes simultaneously using the Split_Into_Chunks procedure to generate cursors for all the iterations of the loop when used as a parallel iterator}. Tampering with the cursors of Container is prohibited while the iterator object exists (in particular, in the sequence_of_statements of the loop_statement whose iterator_specification denotes this object). The iterator object needs finalization. Modify A.18.10(158/3): function Iterate_Subtree (Position : in Cursor) return Tree_Iterator_Interfaces.{Parallel}[Forward]_Iterator'Class; Modify A.18.10(159/3): If Position equals No_Element, then Constraint_Error is propagated. Otherwise, Iterate_Subtree returns a[n] {parallel} iterator object (see 5.5.1) that will generate [a] value{s} for [a] loop parameter{s} (see 5.5.2) designating each element in the subtree rooted by the node designated by Position, starting from with the node designated by Position and proceeding in a depth-first order {when used as a forward iterator, or all nodes simultaneously using the Split_Into_Chunks procedure when used as a parallel iterator}. If Position equals No_Element, then Constraint_Error is propagated. Tampering with the cursors of the container that contains the node designated by Position is prohibited while the iterator object exists (in particular, in the sequence_of_statements of the loop_statement whose iterator_specification denotes this object). The iterator object needs finalization. !discussion Containers such as vectors are obvious candidates for parallelization, since it is easy to conceptually think of a vector as a being splittable into a number of smaller vectors, and then applying a divide and conquer approach to the vector subsets. However, even a sequentially accessed container such as a linked list can be iterated in parallel if cursors for each parallel executor can be obtained prior to execution of the loop. This typically involves iterating once through the container to obtain the cursors, but can still benefit from parallel processing if the amount of time to process each node in the loop is significantly greater than the amount of time needed to iterate through the container. The container writer has the ability to implement the container to define an optimal number of loop parameters to be setup by the Split_Into_Chunks call, so long as the number of splits is not greater than the value of the Max_Chunks parameter. In AI12-0119 we admitted the possibility of chunks, but the model was always one logical thread of control per iteration. Here (and in AI12-0251-1) we make chunks explicit in the model, with one logical thread of control per chunk. This is important to make it clear that all of the iterations associated with a chunk are part of a single logical thread of control, so there is no data race between the iterations of a single chunk. This proposal allows the container implementer a fair amount of flexibility in deciding how to implement the cursor array stored in the iterator object. In general, an implementor should allow for as many elements in the cursor array, as there are cores on the multicore processor. But one could still decide to use a fixed sized array with enough elements to provide good parallelism for most typical processors available today. For instance, a fixed size cursor array of size 64 elements might be a reasonable size to provide good parallelism for most uses. On the other hand, it might make sense to allocate the cursor array dynamically during the call to Split. Then the array can be sized to match any number of cores. Another advantage of dynamic allocation is that the storage would not be needed for iterators associated with non-parallel loops. The iterator types of the containers are controlled types, so finalization of the iterator objects is already part of the iteration model. It would have been nice to specify a post condition for the Split_Into_Chunks call something like; with Post => (for all I in 1 .. Object.Split_Count => Has_Element (Object.Get_Cursor (I))) However, this is not allowed because Has_Element cannot be called during a post condition in this package because the Cursor formal type is an untagged incomplete type. Similarly, it would have been nice to write a Post condition for First_In_Chunk where calling Has_Element on the result would return true, however this is also not allowed for the same reason. We could consider adding these as Pre and Post contracts in the derived types in each of the containers, but the iterator types are not declared in the public part of the package, so it seems to be not worthwhile to do so. !ASIS ** TBD. !ACATS test ACATS B- and C-Tests are needed to check that the new capabilities are supported. !appendix From: Brad Moore Sent: Wednesday, March 28, 2018 12:06 AM Here is a new AI split off from AI12-0119 (Parallel Operations). [This is version /01 of the AI - ED] This one deals specifically with parallel container iteration. The writeup is very much the same as before. **************************************************************** From: Tucker Taft Sent: Wednesday, March 28, 2018 9:11 PM Thanks. I am thinking this should probably also be in Chapter 9 after introducing the notion of tasklets, rather than trying to weave it into the current discussion of iterators, which is already pretty complex. **************************************************************** From: Brad Moore Sent: Thursday, June 14, 2018 1:15 AM Here is an update to AI12-0266, [This is version /02 of the AI - Editor.] The main changes are; - Changed Chunk_Array type to be called Cursor_Array type. The reason is that "Chunk" is not well defined in the RM, and an array of chunks leaves a bit to the imagination. A cursor array however seems a lot easier to visualize. The semantics of Chunk_Array was that for every chunk there was a Start and Finish call that could be used to determine the first and last cursor of each chunk. One problem with this is that it is not clear if the last cursor is inclusive or exclusive to the set of cursors to be iterated over. With a cursor array, there is only one cursor per chunk, and only one call, To_Cursor, available to get the starting cursor. To get the end cursor you use the next highest index value, which should be more obvious that this is an exclusive cursor value. The idea also is that if you specify an index value higher than the number of chunks in the cursor array, you get a No_Element result. So calling To_Cursor with index + 1 consistently gives you a cursor value that can be compared against for chunk iteration termination. Another change is that the Split operation is no longer a function returning an object of a class-wide type. Instead, Split is a procedure that has an out parameter of a class-wide type. (A change requested by Tucker) By doing this, I think there needed to be a way for the container implementer to tell the compiler which type to use for declaring the cursor array type. So I added an Iterator_Cursor_Array aspect, similar to Iterator_Element aspect, that may be specified on a container type declaration for this purpose. Since the implementation is now responsible for declaring the cursor array type, the idea is that the number of elements of the cursor array is now the means to indicate to the container implementer, how many chunks to create. A container implementer can use less than the number of elements of the cursor array, but then must leave the remaining elements set as No_Element. As a result of this, there no longer needs to be an Advised_Split parameter to the Split call. Another change is that since the Split function needs to store cursors into an out parameter Cursor_Array class-wide object, there needed to be a way to store a cursor into a cursor array object. So I added a Set_Cursor procedure to the Cursor_Array interface. **************************************************************** From: Brad Moore Sent: Thursday, June 14, 2018 10:30 AM A question I have about this approach, is it enough to use the Iterator_Cursor_Array aspect to tell the compiler how to set up the cursor array object? Specifically, in my test example using Ada.Containers.Doubly_Linked_Lists, I have the cursor array type having a visible discriminant to set the length of the array. Do we need to specify in the RM that it is expected that all cursor array types will have a single discriminant of a type (Natural) indicating the number of elements in a cursor array? Or maybe we need another aspect to somehow tell the compiler how to specify the cursor array length? Or we could add a Set_Length function to the interface, but then that likely implies a controlled type with dynamic memory allocation, which seems to be undesirable. This also has me wondering if we might have been better off leaving the Split subprogram as a function rather than a procedure. Then the container implementor can create tne cursor array object inside the Split call with the desired length. We can then also simplify the interface by eliminating the Set_Cursor call. I note also that we do have other iterator calls that are functions returning class-wide types, such as Iterate in the containers. **************************************************************** From: Randy Brukardt Sent: Thursday, June 14, 2018 1:46 PM > A question I have about this approach, is it enough to use the > Iterator_Cursor_Array aspect to tell the compiler how to set up the > cursor array object? Let's start further back. Why do we need a Iterator_Cursor_Array object at all? There would have been value to that approach if you could have declared an actual array (fewer subprogram calls), but since it has to be private, getting each cursor is going to involve a subprogram call. So, what is the separate object gaining over just putting the Get_Cursor call directly on the iterator object (the implementation of which is totally user-controlled, so it could have an array of cursors internally if it wants)? The "user" of this structure is the parallel loop, and it obviously has an iterator object it is going to use to iterate. Why does it need anything more?? So I would suggest that Split is just a call to tell the iterator how to organize its internal data structure. Still need the Advised_Split parameter, something like: procedure Split (Object : in out Parallel_Iterator; Advised_Split : in Natural) is abstract; -- Tell Object to create split data with the advised number of splits. function Split_Length (Object : in Parallel_Iterator) return Natural is abstract; -- Determine the actual split length. function Split_Cursor (Object : in Parallel_Iterator; Split_Index : in Natural) return Cursor; or, since it would save an assignment in many implementations: procedure Split_Cursor (Object : in Parallel_Iterator; Split_Index : in Natural; Result : out Cursor); Either of these would have the semantics you suggested yesterday (chunk ends with the 'Succ index, get No_Element if Split_Index >= Length). Do you need anything else? **************************************************************** From: Randy Brukardt Sent: Thursday, June 14, 2018 8:10 PM > > A question I have about this approach, is it enough to use the > > Iterator_Cursor_Array aspect to tell the compiler how to set up the > > cursor array object? > > Let's start further back. Why do we need a Iterator_Cursor_Array > object at all? Sorry, I meant Cursor_Array. I couldn't convince Outlook to let me view your AI update in anything other than Notepad (which just shows a giant block of text; it doesn't recognize Linux line endings) so I had to go from a quick scan of your AI when I wrote this this afternoon (hoping for quick turnaround). Now that I've posted it, I see what I did wrong here. ... > So, what is the separate object gaining over just putting the > Get_Cursor call directly on the iterator object (the implementation of > which is totally user-controlled, so it could have an array of cursors > internally if it wants)? Answering my own question based on your most recent wording: I don't see any. If we let the iterator implementor create a (possibly virtual) array inside their implementation, we don't need any explicit object. It seems to make *everything* simpler (no strange new aspect, no questions about where/when the object gets declared, less to talk about in how the iterator executes, etc.) > The "user" of this structure is the parallel loop, and it obviously > has an iterator object it is going to use to iterate. Why does it need > anything more?? The actual implementation probably has to give a start/end cursor to each thread when it is given work, but the format of that should be 100% up to the implementation. The less we specify, the more likely it is that an implementer can optimize it and create the fastest possible code. > So I would suggest that Split is just a call to tell the iterator how > to organize it's internal data structure. Still need the Advised_Split > parameter, something like: > > procedure Split > (Object : in out Parallel_Iterator; > Advised_Split : in Natural) is abstract; > -- Tell Object to create split data with the advised number of splits. Note that the internal cursor array (inside of a Parallel_Iterator'Class object) could be fixed length (meaning that no more splits than the max number is supported) or it could be dynamically allocated. I don't see much reason to support massive numbers of splits anyway, certainly not to require it. (Perhaps we might give a minimum number in Implementation Advice?) > function Split_Length (Object : in Parallel_Iterator) return > Natural is abstract; > -- Determine the actual split length. We'd need say what happens here if procedure Split hasn't been called. I'd suggest saying that it returns 1. > function Split_Cursor (Object : in Parallel_Iterator; > Split_Index : in Natural) return Cursor; > > or, since it would save an assignment in many implementations: > > procedure Split_Cursor (Object : in Parallel_Iterator; > Split_Index : in Natural; > Result : out Cursor); > > Either of these would have the semantics you suggested yesterday > (chunk ends with the 'Succ index, get No_Element if Split_Index >= > Length). Again, if Split hasn't been called, the first cursor is Object.First, and the rest are No_Element. > Do you need anything else? I don't think so, based on my reading of your current text. Assuming you concur, if you could update the AI this way before the deadline, it would help progress it in Lisbon. **************************************************************** From: Brad Moore Sent: Thursday, June 14, 2018 10:54 PM Thanks Randy, the idea of hiding the cursor array inside the iterator hadn't occurred to me, but it looks like it does simplify things a lot, as well as gives more flexibility to the implementation. I will attempt to update the AI to include these ideas. **************************************************************** From: Brad Moore Sent: Friday, June 15, 2018 11:22 AM > -- Determine the actual split length. > > function Split_Cursor (Object : in Parallel_Iterator; > Split_Index : in Natural) return Cursor; > > or, since it would save an assignment in many implementations: > > procedure Split_Cursor (Object : in Parallel_Iterator; > Split_Index : in Natural; > Result : out Cursor); > I am thinking of going with the function version, because a) It is easier to work with, as it possibly saves having to declare two objects, since the client needs both a start and end cursor, and b) Because I suspect compilers would typically apply a transformation of the users code to a procedure call accepting two cursors. The function results can be inline expressions for the parameters to the call, so I dont think we'd be saving an assignment in that case. Otherwise you have to assign to the declared cursor variables, and then assign again to the parameters of the call. eg. -- 2% Raise to everyone! parallel for Employee of List loop Employee.Salary := @ + @ * 0.02; end loop; might get transformed to something like; declare procedure Give_Raise (From, To : Employee_Lists.Cursor) is Position : Employee_Lists.Cursor := From; use type Employee_Lists.Cursor; begin -- Get_Payroll while Position /= To loop -- 2% Raise to everyone! List (Position).Salary := @ + @ * 0.02; Employee_Lists.Next (Position); end loop; end Give_Raise; Iterator : Emp_List_Iterators.Parallel_Reversible_Iterator'Class := List.Iterate; begin Iterator.Split (Advised_Split => 4); parallel for Chunk in 1 .. Iterator.Split_Count loop Give_Raise (From => Iterator.Get_Cursor (Chunk), To => Iterator.Get_Cursor (Chunk + 1)); end loop; end; **************************************************************** From: Brad Moore Sent: Friday, June 15, 2018 11:43 AM > Note that the internal cursor array (inside of a > Parallel_Iterator'Class > object) could be fixed length (meaning that no more splits than the > max number is supported) or it could be dynamically allocated. I don't > see much reason to support massive numbers of splits anyway, certainly > not to require it. (Perhaps we might give a minimum number in > Implementation Advice?) The number of splits should typically be at least as many as the number of available cores. For some platforms that number might be unknown. For example, for IRTAW, I was interacting with a couple of people from the Barcelona supercomputing center, where they were experimenting with parallelism on 50 core computers. They managed to get my Paraffin code running on that system, so I think it was a Linux based system. I think storing the cursor array in the iterator for such a system would suggest that the cursor array would needs to be dynamically allocated, (by the Split call). As otherwise, the iterator object could become a very large object to accomodate the largest possible number of cores, and that might cause troubles if declared on a stack. If you know your target platform only supports a small number of cores, then a fixed size array could be appropriate. Another advantage of dynamically allocating is that if the Split function is not called, (for example for a non-parallel loop), then nothing needs to be freed upon finalization of the iterator, and no storage is needed for the cursor array during the lifetime of the iterator object. **************************************************************** From: Randy Brukardt Sent: Friday, June 15, 2018 2:37 PM I mentioned using a statically sized array because this has to work in the bounded forms, where dynamic allocation and finalization is strongly discouraged. Perhaps there is some way to use the secondary stack in GNAT for this allocation, but otherwise the size needs to be known early. I thought that a 64 cursor array probably would be enough for most existing targets, and the size of that (128 words) is not going to be so large as to cause stack problems. There's probably going to be memory use issues if someone has 10,000 cores, but that's mostly theoretical (and I've read that the number of cores is likely to be limited by memory contention issues). For the unbounded forms, it's probably better to use dynamic allocation, but obviously that's up to the implementation. **************************************************************** From: Randy Brukardt Sent: Friday, June 15, 2018 2:50 PM > I am thinking of going with the function version, because > > a) It is easier to work with, as it possibly saves having to declare > two objects, since the client needs both a start and end cursor, and I thought the compiler would declare them somewhere anyway. In any case, having thought on it more, I doubt that there would be any difference since the objects in question are compiler-generated and thus it is not necessary to preserve Ada semantics for their assignments. > b) Because I suspect compilers would typically apply a transformation > of the users code to a procedure call accepting two cursors. The > function results can be inline expressions for the parameters to the > call, so I dont think we'd be saving an assignment in that case. > Otherwise you have to assign to the declared cursor variables, and > then assign again to the parameters of the call. I expected that the "manager" (the main thread) would set of an array of "job" records, and the procedure call would simply pass one of the job records. For this sort of iterator, the job would contain the two cursors. Note that you're not really saving anything by directly passing cursors this way. Cursors don't fit in a parameter on most targets, so they're probably getting passed by reference. So there has to be a temporary object declared somewhere to allow that. If you're generating this expansion at the intermediate code level (as we would do in Janus/Ada), you have declare all of these things explicitly anyway (nothing implicit in intermediate code!). So while there might be a small convinience when using a library (like Paraffin), it's irrelevant to an implementation. (At least some compilers implement functions like this as procedure calls anyway, so there isn't any difference at the implementation level - unless an extra temporary is needed to meet Ada semantics, as in an assignment.) **************************************************************** From: Brad Moore Sent: Friday, June 15, 2018 2:14 PM Here is an update to AI12-266, Container Iterators [This is version /03 of the AI - Editor.] The main changes are to eliminate the cursor_array interface, and instead imply that a cursor array or set of cursors is stored in the iterator object, when Split is called. This approach overall is a lot simpler than the previous version. There is no need for the Iterator_Cursor_Array aspect that the ' previous proposal had. This gives the container implementor more freedom in designing data structures for storing the loop parameters needed for parallel iteration. I also changed the name of the calls, from To_Cursor to Get_Cursor, and from Length to Split_Count, which I think adds clarity. Also, instead of using Natural as the subtype for split indexes and cursor counts, I defined subtype Cursor_Count is Positive; subtype Split_Index is Positive; Which I think also adds clarity and ties the calls Split, Split_Count, Get_Cursor, together more meaningfully. Finally, I also added some pre and post conditions which help to make it clearer how these calls work. I would have liked to specify post conditions that check that Has_Element(Get_Cursor (I)) for all I in 1 .. Object.Split_Count is true after calling Split, or a post condition for Get_Cursor such as Post => (if Index = Object.Split_Count + 1 then Has_Element(Get_Cursor'Result) = False But the compiler tells me that one cannot use Has_Element in a Pre or Post condition in this package because the call involves a non tagged incomplete type (Cursor) I suppose we could add these as Pre and Post contracts to each of the containers but the iterator types are not visible in the public part of the package, so it seems not worthwhile to do so. **************************************************************** From: Randy Brukardt Sent: Friday, June 15, 2018 9:07 PM > Here is an update to AI12-266, Container Iterators > > The main changes are to eliminate the cursor_array interface, and > instead imply that a cursor array or set of cursors is stored in the > iterator object, when Split is called. This approach overall is a lot > simpler than the previous version. > There is no need for the Iterator_Cursor_Array aspect that the ' > previous proposal had. Thanks, this seems a lot simpler. For the record, I one editorial fix. Part of the interface referenced the package "Ada202x". I presume this is something you used in your testing, since there is no such thing proposed here or anywhere else (nor did it appear in the previous version of the AI). So I replaced all of those with "Ada". I'd probably add a bit of discussion similar to our recent back-and-forth on how the cursor array could get implemented in the iterator object. Some people might be concerned about the apparent need for dynamic allocation, and that should explicitly addressed. ... > I would have liked to specify post conditions that check that > Has_Element(Get_Cursor (I)) for all I in 1 .. > Object.Split_Count is true after calling Split, or a post condition > for Get_Cursor such as Post => (if Index = Object.Split_Count + 1 then > Has_Element(Get_Cursor'Result) = False > > But the compiler tells me that one cannot use Has_Element in a Pre or > Post condition in this package because the call involves a non tagged > incomplete type > (Cursor) That sounds correct. Perhaps the rule is a bit more restrictive than it needs to be (one could allow the calls and then require a recheck for an instance - an assume-the-best rule), but changing that doesn't sound pleasant. > I suppose we could add these as Pre and Post contracts to each of the > containers but the iterator types are not visible in the public part > of the package, so it seems not worthwhile to do so. Don't see the point. The most important use of such Pre/Post is to eliminate the need to describe these restrictions and results in English. And they wouldn't apply to user-defined iterators if you only put them into the container. **************************************************************** From: Brad Moore Sent: Saturday, June 16, 2018 12:21 AM > For the record, I one editorial fix. > > Part of the interface referenced the package "Ada202x". I presume this > is something you used in your testing, since there is no such thing > proposed here or anywhere else (nor did it appear in the previous version of > the AI). So I replaced all of those with "Ada". Yes, that was just for testing, I forgot to do the change you did this time round. > I'd probably add a bit of discussion similar to our recent > back-and-forth on how the cursor array could get implemented in the > iterator object. Some people might be concerned about the apparent > need for dynamic allocation, and that should explicitly addressed. I have attached another version with the discussion section updated to mention the possibility of implementation using fixed size array vs dynamic allocation. [This is version /04 of the AI - Editor.] > ... >> I would have liked to specify post conditions that check that >> Has_Element(Get_Cursor (I)) for all I in 1 .. >> Object.Split_Count is true after calling Split, or a post condition >> for Get_Cursor such as Post => (if Index = Object.Split_Count + 1 >> then Has_Element(Get_Cursor'Result) = False >> >> But the compiler tells me that one cannot use Has_Element in a Pre or >> Post condition in this package because the call involves a non tagged >> incomplete type >> (Cursor) > > That sounds correct. Perhaps the rule is a bit more restrictive than > it needs to be (one could allow the calls and then require a recheck > for an instance - an assume-the-best rule), but changing that doesn't > sound pleasant. The more I think about this, the more I think one ought to be able to specify such class-wide pre and post conditions for abstract subprograms. I wonder if there might be a way to relax the rules in a simple way to allow this at least. Anyway, you are probably right that it might not be pleasant, but maybe something to consider if it would be worthwhile, at some point. **************************************************************** From: Randy Brukardt Sent: Saturday, June 16, 2018 2:56 AM > > I'd probably add a bit of discussion similar to our recent > > back-and-forth on how the cursor array could get implemented in the > > iterator object. Some people might be concerned about the apparent > > need for dynamic allocation, and that should explicitly addressed. > > I have attached another version with the discussion section updated to > mention the possibility of implementation using fixed size array vs > dynamic allocation. You'll excuse me if I don't post this now; it's 12 hours past the deadline, I've already posted the agenda and the files and loaded my laptop and there's water on the floor in here from a severe thunderstorm that just went thru... > > ... > > That sounds correct. Perhaps the rule is a bit more restrictive than > > it needs to be (one could allow the calls and then require a recheck > > for an instance - an assume-the-best rule), but changing that > > doesn't sound pleasant. > > > > The more I think about this, the more I think one ought to be able to > specify such class-wide pre and post conditions for abstract > subprograms. The problem has nothing to do with preconditions of abstract subprograms. It's that you're doing it in a generic with a formal incomplete type that isn't supposed to be used in any way other than to declare access types and subprogram parameters. And that was because we needed this generic to work on a *private* cursor. And you might remember that we were never able to find an adequate solution for *that* problem. > I wonder if there might be a way to relax the rules in a simple way to > allow this at least. > Anyway, you are probably right that it might not be pleasant, but > maybe something to consider if it would be worthwhile, at some point. I was thinking that we might be able to allow it in a generic if we require the recheck. However, the recheck would fail in this case (the type being private) so I don't think that helps much. In any case, incomplete types are not supposed to have any sort of object created for them, and allowing this case without allowing anything that causes real trouble sounds painful. It would have been better if we didn't need to use this hack to get an instance of a private type - but we *know* that was painful. **************************************************************** From: Brad Moore Sent: Sunday, October 14, 2018 3:53 PM Here is a modest update to AI12-0266-1 Parallel Container Iteration. [This is version /05 of the AI - Editor.] Mostly the changes involve renaming types and subprograms to address ARG comments. A Chunk_Finished call was added to the Iterator_Interfaces, rather than expecting the compiler to compare the current cursor with another cursor value, which cant be counted on, since equality checks are not part of the interface. One thought for consideration was to have the Split_Into_Chunks call somehow temporary invalidate all existing cursors, and have the First, Next, Last, and Previous only allow retrieving cursors associated with the current chunk and a particular logical thread of control. That idea held some appeal, as it would seem to add safety to the containers. It would be more difficult for the user to write code to iterate from one chunk into another. That would also allow us to eliminate the call to Chunk_Finished. Since we are already calling Next to Advance the cursor, we could define Next to return No_Element if one advances to the end of a chunk, and do the usual comparison against No_Element to terminate the iteration. In any case, this idea did not make it into the AI in any form. Just food for thought. Perhaps that idea could be discussed at the meeting if it has any merit. **************************************************************** From: Randy Brukardt Sent: Tuesday, October 16, 2018 11:36 PM ... > One thought for consideration was to have the > Split_Into_Chunks call somehow > temporary invalidate all existing cursors, and have the > First, Next, Last, and Previous only > allow retrieving cursors associated with the current chunk > and a particular logical thread of control. > > That idea held some appeal, as it would seem to add safety to > the containers. ... Sure, but what would the implementation be? A canonical cursor is a pair of access values (one to the container, one to the node containing the element). How would the implementation figure out if the cursors are "temporarily invalidated"? Walking lists in the container might work, but would be O(N) and cursors are supposed to be O(1). And you'd have to worry about tasking interactions (there are many logical threads going here at once). Sounds to me like it would make tampering look like a cheap idea in contrast. :-) --- A nitpick: >A parallel iterator provides a Split_Into_Chunks procedure that can be used >to associate with the iterator a set of cursors that define the chunk >boundaries for parallel iteration. Each cursor is assigned to a different >logical thread of control ... Probably should be "can be assigned", since there is no requirement on the implementation to use different threads of control for each chunk. (One might like that, but it's not always practical.) Or, maybe just drop the whole discussion about "logical thread of control" here and just focus on the chunk. I'll let you decide on this, I know you are sending me another change that we've been discussing privately. --- > ... A parallel reversible iterator type is a type descended from both the Parallel_Reversible_Iterator interace from some instance of Ada.Iterator_Interfaces.} "both"? There's only one "interace" mentioned. "Interace" sounds like it could get political. :-) I fixed both of these. --- I reformatted a lot of paragraphs to keep the lines under 80 characters. Some people appreciate that. **************************************************************** From: Randy Brukardt Sent: Monday, November 19, 2018 11:18 PM The (very drafty) minutes for AI12-0266-1 say: > Can Split return zero chunks? Randy thought not, since something has > to be iterated, but others think that is also silly. So we need to > adjust the subtypes. OTOH, approved AI12-0251-1 (which defines a chunk_specification) says: > When the chunk_specification is elaborated, a check is made that the > determined maximum number of logical threads of control is greater > than zero. If this check fails, Program_Error is raised. So here we're not allowing zero chunks. It seems weird to me that a parallel iterator would allow zero chunks even though the underlying loop would not. Brad had used a subtype to ensure that the routines couldn't return no chunks, and it seems to me that that would be more consistent with the chunk_specification. Thoughts?? **************************************************************** From: Tucker Taft Sent: Tuesday, November 20, 2018 9:19 AM > So here we're not allowing zero chunks. It seems weird to me that a > parallel iterator would allow zero chunks even though the underlying loop > would not. I see these as somewhat different. One is specifying the maximum number of chunks. The other is returning the actual number of chunks. It seems weird to ever specify a maximum of zero, but it seems fine to have an actual number of chunks being zero. I suppose we could allow a maximum of zero if and only if the underlying iteration has zero elements. This would allow a computation that divided the number of iterations by the chunk size, always rounding up. > Brad had used a subtype to ensure that the routines couldn't return no > chunks, and it seems to me that that would be more consistent with the > chunk_specification. > > Thoughts?? I think it should be OK for the Split routine to return zero chunks if there are no elements. I don't believe there is a promise that it returns the maximum number of chunks. but only that the number of chunks is no greater than the specified maximum. So I think things are OK as is, perhaps modulo allowing a zero maximum if the number of elements is known to be zero as well. **************************************************************** From: Randy Brukardt Sent: Tuesday, November 20, 2018 1:21 PM The number of chunks is also the number of logical threads of control. And I can't see how anything can be executed with zero logical threads of control, even if there are no elements. (It usually takes some execution to determine that there are zero elements.) Perhaps your mental model is that the actual number of logical threads of control is the number of chunks plus one for loop control? I don't see how that could be derived from the wording as given; if that's your intent, there needs to be at least an AARM note to that effect. **************************************************************** From: Tucker Taft Sent: Tuesday, November 20, 2018 1:54 PM > The number of chunks is also the number of logical threads of control. > And I can't see how anything can be executed with zero logical threads > of control, even if there are no elements. (It usually takes some > execution to determine that there are zero elements.) I think that was probably why the original AI disallowed ever having a maximum of zero. But I think the obvious thing would be that if the maximum number of chunks is <= 1, then you don't bother spawning any new threads. > Perhaps your mental model is that the actual number of logical threads > of control is the number of chunks plus one for loop control? I don't > see how that could be derived from the wording as given; if that's > your intent, there needs to be at least an AARM note to that effect. I certainly don't feel that strongly about allowing a maximum of zero. On the other hand, you already have a thread which is elaborating the chunk specification, so there is nothing preventing that thread from at least noticing that the maximum is zero, and performing the check that in fact there are no iterations to perform. So I don't see it as a big deal either way, and I don't really see the need for an AARM note, since the check that there is in fact nothing to do is reasonably done before we start spawning any additional threads. **************************************************************** From: Randy Brukardt Sent: Wednesday, November 21, 2018 12:00 AM If that's the logic (and that's fine), then there doesn't seem to be any reason to disallow a maximum of zero (it just would mean that the original thread executes the loop, if any execution is needed). I'd expect that 1 chunk would mean the same thing (why spawn one thread and have the original thread sitting around waiting for it?), but at least logically it would be different. Certainly we don't want a negative maximum, so we need some sort of check in any case. The idea would be that the number is the maximum of *extra* logical threads (some of which might get mapped to the existing thread that is elaborating the loop). If that's the case, then zero makes sense -- not a lot, as it essentially would cancel the "parallel" keyword of the loop (of course, any Legality checks would remain), but it wouldn't be nonsense. Anyway, if we want to change this to between these consistent (and I do, one way or the other), then we need some proposed revised wording to put into a clean-up AI. A suggestion? **************************************************************** From: Tucker Taft Sent: Wednesday, November 21, 2018 10:26 AM On further thought, I don't really think it works to have a zero maximum in the case where there is a chunk parameter if the number of iterations is non-zero, because what value would the chunk parameter have inside the loop? We could allow zero chunks when there is no specified chunk parameter, or when a check indicates there are no iterations, but allowing zero chunks in general and just saying the loop should be treated as a sequential loop doesn't work in the presence of a chunk parameter. I am just about convinced to go back to having a minimum of one chunk determined by the chunk specification, and let the programmer use Integer'Min or equivalent to prevent zero showing up. I also still think it is fine for Split to return fewer chunks than the maximum, including zero chunks. I don't see the contradiction. I would say the "Maximum_Split" parameter (or whatever it is now called) should be required to be greater than zero for consistency with this AI, but it seems fine if Chunk_Count returns zero when there are zero elements in the container. **************************************************************** From: Randy Brukardt Sent: Wednesday, November 21, 2018 2:59 PM > ... I also still think it is fine for Split to return fewer chunks > than the maximum, including zero chunks. > I don't see the contradiction. I would say the "Maximum_Split" > parameter (or whatever it is now called) should be required to be > greater than zero for consistency with this AI, but it seems fine if > Chunk_Count returns zero when there are zero elements in the > container. That only makes sense to me if we ban empty chunks otherwise. With the anything goes scheme you are proposing, you have the overhead to check for empty loops *and* overhead to check for zero chunks. That seems silly. (In almost all cases, null ranges and the like require dedicated code at the machine level.) My understanding of Brad's original proposal is that empty chunks were allowed but there had to be at least one chunk. That makes the most sense to me, since for some data structures it might be expensive to guarantee that elements exist in a particular range (think graphs or trees). In that case, allowing no chunks is just extra overhead, as that has to be checked for explicitly before starting, and you're doing a similar check anyway for each chunk. OTOH, if you were to not allow empty chunks, then it makes sense for an empty container to return no chunks. In that case, the overhead of checking for empty moves to the chunk part of the code (and not in the individual chunk loops). I think this is not as good of a model because it would require expensive complete walks of some data structures -- but it's not a critical issue as I don't think it matters for any of the existing containers. (It matters mainly for hypothetical user-defined containers, clearly a less important problem.) Requiring both is madness -- there should only be one way to present an empty container to the parallel iteration. But that seems to be what you have suggested. **************************************************************** From: Tucker Taft Sent: Wednesday, November 21, 2018 4:19 PM > ... > Requiring both is madness -- there should only be one way to present > an empty container to the parallel iteration. But that seems to be what > you have suggested. I guess I don't see it that way. Both seem reasonable. One or more empty chunks are fine, even if there are elements in other chunks, and it is fine to have no chunks at all when there are no elements. They seem orthogonal. Obviously you feel differently. Perhaps a straw vote? **************************************************************** From: Randy Brukardt Sent: Wednesday, November 21, 2018 6:44 PM > I guess I don't see it that way. Both seem reasonable. One or more > empty chunks are fine, even if there are elements in other chunks, and > it is fine to have no chunks at all when there are no elements. They > seem orthogonal. "Parallel" exists to improve performance. Adding extra code (and thus execution time) into the iteration scheme just because you feel it is "fine" for the items to be empty doesn't make any sense to me. Either one or the other should be required to be non-empty so the extra overhead of the test that is inevitably required for dealing with the empty case is eliminated. That does seem to argue that the chunks should be non-empty (as that overhead is repeated for each chunk, while the other test only occurs once), but I don't care a lot which is chosen. If we were just defining an interface without any performance implications, then I'd probably agree with you (more flexible is better). But that's not the case here. This is the same reason that I got Brad to invert the interface so that the iterator queries for each chunk rather than trying to return them all at once -- we have to wring out every bit of avoidable overhead. And this is overhead that can be avoided. > Obviously you feel differently. Perhaps a straw vote? I don't think we can effectively conduct a straw poll here, as only a few people will have followed the conversation enough to have an informed opinion. Most likely, Brad will chose some approach and then you or I or both can argue at the upcoming meeting for a different approach. It's only a couple of weeks from now anyway. **************************************************************** From: Brad Moore Sent: Thursday, December 6, 2018 12:22 AM I have a somewhat different perspective to offer here, as it appears that apples and oranges are being compared, or rather, Apples and Apple slices. I think the semantics of the chunk_specification syntax in AI12-0251-1 is about specifying the number of logical threads of control being applied to a loop, whereas the Split function in AI12-0266-1 is about splitting a container into separate chunks of elements. The Advised_Split parameter to the Split call logically makes sense to me as the recommended number of splits to be applied to the container. So if Advised_Split is specified as 0, then to me, it makes sense to think of that as zero splits are applied, ie one chunk. (Which could be an empty chunk if there are no elements in the container or iterator to be iterated. From this, an Advised_Split of 1 would mean split the container once, (i.e two chunks), and an Advised_Split of 2 would mean the the container twice, (i.e. three chunks) and so on. Thus, if N is the number of logical threads specified by the chunk_specification in a parallel loop, it would equate to N - 1 being passed as the Advised_Split parameter in a call to Split. This feels like it would be simpler to explain and understand. Otherwise if feels messy and kludgy to me to try to rationalize that 0 and 1 for Advised_Split mean the same thing. **************************************************************** From: Jean-Pierre Rosen Sent: Thursday, December 6, 2018 1:58 AM > So if Advised_Split is specified as 0, then to me, it makes sense to > think of that as zero splits are applied, ie one chunk. (Which could > be an empty chunk if there are no elements in the container or iterator to be iterated. > >>From this, an Advised_Split of 1 would mean split the container once, >>(i.e two chunks), > and an Advised_Split of 2 would mean the the container twice, (i.e. three > chunks) and so on. This looks to me as opening the door to endless discussions and explanations to users; although I'm one of these who hardly followed the discussions, it seems to me that if the name is that ambiguous, it should be changed: call it Advised_Chunks, there will be no discussion. **************************************************************** From: Tucker Taft Sent: Thursday, December 6, 2018 7:17 AM I prefer Max_Chunks. "Advised" is a weird term in this context, and implies that it is a "hint" rather than a maximum. **************************************************************** From: Randy Brukardt Sent: Thursday, December 6, 2018 4:54 PM That's not the question. (At least, it wasn't *my* question.) AI12-0251-1 makes it clear that the maximum number of chunks cannot be zero, ergo 0 cannot be passed to Max_Chunks (or whatever name for the parameter was decided upon in Lexington). If one tries to set the maximum number of chunks to 0, Program_Error is raised. My concern is Split *returning* zero chunks. I see absolutely no reason to allow that, given that we are allowing chunks to be empty. If we do allow returning zero chunks, the loop code would necessarily need a special case to deal with that (just like dealing with null slices and empty iterations burdens the generated code). Typically, in Ada we have no choice about allowing zero of something, but in this case, we do. We have to allow *something* to be empty, but there's no need to allow *everything* to be empty. It doesn't make sense for a loop to execute with zero threads, regardless of whether that loop does anything. Thus, I don't believe that Split should be allowed to return zero chunks (nor should the maximum to allowed to be zero). **************************************************************** From: Tucker Taft Sent: Thursday, December 6, 2018 8:27 PM Last time we went around on this I came to agree that we should at least disallow 0 for the "determined maximum number of chunks" in the latest update of the AI12-0251-1 AI. So hopefully this means we never have to allow zero for the "Max_Chunks" parameter (or whatever we end up calling it) to Split. As far as Split, I am fine if others believe it is better to disallow Split from returning zero chunks. It does seem marginally more efficient at the point of use to not have to worry about this special case. **************************************************************** From: Richard Wai Sent: Wednesday, November 21, 2018 12:10 PM I've been trying to get up to speed on the various AIs regarding parallel iteration, so please forgive me if I missed something! I generally take a more application-specific view on Ada. Direct use of the generalized iteration and indexing facilities by the programmer can be extremely useful in a lot of cases. We have actually used these quite a bit in our work. So with that perspective, I'm pretty interested in continuity/clarity in implementing parallelized iteration. It's probably safe to say that Chunk_Finish is related to Has_Element. Yet it's name suggest a disconnect, which could be confusing, especially for direct programmer application. Where one iterates until Has_Element returns False*, it could seem that one iterates on a Chunk until Chunk_Finished returns True. * I note here that the original AI does suggest that Chunk_Finished returns False at the end of a Chunk, and I don't see any specific amendment that changes this. This is obviously confusing, as one would expect Chunk_Finish to return True when the Chunk is finished, though I recognise that we iterate until Has_Element returns False, so that would be consistent. So here is my humble suggestion: Change Chunk_Finished to Chunk_Has_Element, and specify that it returns True if and only if Position is a member of the chunk indexed by Chunk. This also makes implementation of Chunk_Index a bit easier, as it can simply specify a first/last Cursor, which Chink_Has_Element can check against easily enough. -- As a further suggestion, in this case, I think Start_Of_Chunk should then also be renamed to Chunk_First, for consistency. **************************************************************** From: Tucker Taft Sent: Wednesday, November 21, 2018 12:21 PM Good suggestions, Richard! **************************************************************** From: Richard Wai Sent: Wednesday, November 21, 2018 12:52 PM If only I had run my email through the Ada compiler first, it would have found all of my typos! : "Chunk_Finish" is undefined : "Chink_Has_Element" is undefined **************************************************************** From: Randy Brukardt Sent: Wednesday, November 21, 2018 2:46 PM > Good suggestions, Richard! I concur, but a caution: many AIs were extensively changed by the discussions at the October ARG meeting, but neither the minutes nor the updated AIs (in the case where they were approved) are available yet. (I'm working on those as hard as I can with the holiday week.) So you're necessarily looking at obsolete versions of the AIs. In this particular case, we changed many of the names of the routines, as well as many of the parameters, but not the specific one you are concerned about. **************************************************************** From: Brad Moore Sent: Wednesday, January 9, 2019 3:40 PM I was looking at my test program for the Parallel Iterator proposal, and it was bothering me that the implementation for the parallel loop was issuing two calls to the interface for every iteration. Process_Loop : while not Chunk_Finished (...) loop Loop_Body; Position := Iterator.Next (Position); end loop Process_Loop; I am viewing subprogram calls into an interface as being expensive. I believe this can be reduced to a single call. So I am proposing modifying the Start_Of_Chunk call to have a Boolean out parameter, End_Of_Chunk, And then replacing the Chunk_Finished call with an override of Next that take a Chunk_Index as an input and has an out parameter, End_Of_Chunk. as in; function Start_Of_Chunk (Object : Parallel_Iterator; Chunk : Chunk_Index; End_Of_Chunk : out Boolean) return Cursor is abstract with Pre'Class => Object.Is_Split and then Chunk <= Object.Chunk_Count; function Next (Object : Parallel_Iterator; Position : Cursor; Chunk : Chunk_Index; End_Of_Chunk : out Boolean) return Cursor is abstract with Pre'Class => Object.Is_Split and then Chunk <= Object.Chunk_Count; Then the implementation loop would look something like; procedure Chunk_Code (Chunk : My_Interface.Chunk_Index) is End_Of_Chunk : Boolean := False; Position : Employee_Lists.Cursor := Iterator.Start_Of_Chunk (Chunk, End_Of_Chunk); begin -- while not End_Of_Chunk loop -- Loop_Body Position := Iterator.Next (Position, Chunk, End_Of_Chunk); end loop Process_Loop; -- Finalize end Chunk_Code; I believe this would potentially improve the execution time of the loop, which is the main point for parallel iterators, so unless I hear otherwise, I am proceeding with this change to the Parallel Iterator interface in the writeup of the AI. Comments? **************************************************************** From: Randy Brukardt Sent: Wednesday, January 9, 2019 3:58 PM > I am viewing subprogram calls into an interface as being expensive. > I believe this can be reduced to a single call. I suppose, but that would be different than the usual iterators. It seems best for everyone if these work as similarly as possible. I note that this particular interface can be implemented rather like an abstract type, at least in the usual use cases, so dispatching doesn't have to be as expensive as it is in cases of multiple inheritance. > So I am proposing modifying the Start_Of_Chunk call to have a Boolean > out parameter, End_Of_Chunk, > > And then replacing the Chunk_Finished call with an override of Next > that take a Chunk_Index as an input and has an out parameter, > End_Of_Chunk. Huh? You can't "override" a routine and change the profile. That's called "overloading". And Ada's rules would require that you still implement the older (and now unused) Next, which doesn't seem great for usability. Seriously, Brad, we're far beyond the point where you ought to be tinkering with the interface itself. We discussed the interface and asked for a bunch of name and subtype changes, not a wholesale overhaul. (And then we requested a few addition changes in e-mail threads last month.) We can never finish these things if you redesign them every time you look at them - we've already agreed to a design. **************************************************************** From: Brad Moore Sent: Wednesday, January 9, 2019 4:28 PM >> I am viewing subprogram calls into an interface as being expensive. >> I believe this can be reduced to a single call. > > I suppose, but that would be different than the usual iterators. It > seems best for everyone if these work as similarly as possible. I note > that this particular interface can be implemented rather like an > abstract type, at least in the usual use cases, so dispatching doesn't > have to be as expensive as it is in cases of multiple inheritance. Its not necessarily the dispatching that I am concerned about. It's been my experience that subprogram calls in general are expensive. Adding an extra call for every iteration seems a lot worse than the concern you had about checking to see at the beginning of the loop whether there were zero iterations, which you were quite adamant about eliminating the overhead for. And using your argument, if performance weren't the main point of this feature, then I would agree with you, but since we want to in theory squeeze out whatever performance we can, it seems we should be eliminating this overhead, since it is possible to do so. >> So I am proposing modifying the Start_Of_Chunk call to have a Boolean >> out parameter, End_Of_Chunk, >> >> And then replacing the Chunk_Finished call with an override of Next >> that take a Chunk_Index as an input and has an out parameter, >> End_Of_Chunk. > > Huh? You can't "override" a routine and change the profile. That's > called "overloading". And Ada's rules would require that you still > implement the older (and now unused) Next, which doesn't seem great for usability. Sorry, overload is what I meant. And actually, one of Tuckers comments was to rename Start_Of_Chunk to First_In_Chunk, and then have a subprogram called Next_In_Chunk, rather than Chunk_Finished in the name of making the usage look more similar to other interfaces. So there actually would not any overload, and having the Next_In_Chunk is actually addressing a comment in the minutes from our last discussion of this AI. Plus, I think the First_In_Chunk, Next_In_Chunk calls end up looking more similar to other iterator interfaces this way. Plus it would not make sense to have a call Next_In_Chunk at the top of the loop. It needs to be at the end of the loop to make sense. So, I really think the changes I am proposing fall in scope for what was asked in the last discussion. **************************************************************** From: Brad Moore Sent: Wednesday, January 9, 2019 4:43 PM Also, in the name of addressing the comments from the Lexington meeting, I would eliminate the End_Of_Chunk boolean output parameters, and instead return a NULL Cursor if we are at the end of the chunk. Which makes the interface look even more similar to the other iterator interfaces. **************************************************************** From: Brad Moore Sent: Wednesday, January 9, 2019 4:56 PM Note that by returning a null cursor, though, we'd be requiring a call to Has_Element at the top of the loop eg while Has_Element (Position) loop So we are back to having two calls per iteration. I suppose a compile could inline that call though. Right? In which case, I think case, I think my concerns would be addressed, as well as addressing comments raised against the AI at the last discussion of the AI. **************************************************************** From: Tucker Taft Sent: Wednesday, January 9, 2019 4:02 PM By all means please make this look as much as possible like a normal iterator. **************************************************************** From: Randy Brukardt Sent: Wednesday, January 9, 2019 5:22 PM Agreed. To the extent that some other interface could have provided better performance, that's a mistake that was made when the original iterators were designed and it's rather too late to change that now. (As I recall, the intent was that the iterator interface would operate similarly to the way one would write an iterator explicitly using the operations of the containers. Originally, the iterators shared the implementation with the containers, but since that didn't work for multiple iterations on the same object, we abandoned that model but never really rethought the interface.) **************************************************************** From: Brad Moore Sent: Wednesday, January 9, 2019 11:36 PM Attached is my homework for AI12-0266-01, Parallel Container Iterators. [This is version /07 of the AI - Editor.] The summary of the main changes are; - Changed Nonblocking of everything in the Iterator_Interfaces to depend on the Nonblocking of the generic formal parameters - Removed the Iterations call from the Parallel interfaces - Added Is_Split to use as precondition for the parallel iterator calls, and raises Program_Error if Split_Into_Chunks is called more than once, or if the other calls are made without having called Split_Into_Chunks - Renamed parameter Maximum_Split of Split_Into_Chunks to Max_Chunks A number of splits is confusing. If number of splits is 1, does that mean 1 or 2 chunks. Renaming to Max_Chunks eliminates the confusion. - Renamed Start_Of_Chunk to First_In_Chunk - Replaced Chunk_Finished call with Last_In_Chunk so the iteration looks much more similar to the existing iterator interfaces - Added a Last_In_Chunk and Previous_In_Chunk call for Parallel_Reversible_Iterator_Interfaces **************************************************************** From: Randy Brukardt Sent: Thursday, January 10, 2019 10:46 PM > Attached is my homework for AI12-0266-01, Parallel Container > Iterators. Editorial fixes applied: The first bullet under "goals of this proposal" needs to end with a semicolon ";" like the later bullets. All of the Modify and Adds need colons at the end (":"). If I'm going to complain to Justin, I have to complain to you, too. ;-) The way you dealt with parameters on multiple lines is not the way the RM typically does it (I checked Containers, Text_IO, and Calendar): -- RM: procedure Foo (A : in Natural; B : in String); -- You: procedure Foo (A : in Natural; B : in String); I had to adjust all of the specs for the Nonblocking anyway (see below), so I fixed this now. Technical issues (fixed in version /08, a minor redraft): The grammar for 5.5(3) was extensively changed by AI12-0251-1. This grammar change needs to reflect that. If we didn't take AI12-0251-1 and AI12-0294-1 into account here (otherwise we'd have to write a new AI nearly as big as this one to change everything to match that). See more below in stuff I *didn't* do. --- I gave you the wrong info about Nonblocking. We can't change the instance Nonblocking, as that could be incompatible with existing code. (That is, if the generic formals are Nonblocking, then the overriding routines for the base interfaces could not allow blocking (Nonblocking => False). But that's the default for Nonblocking, so such a change would reject code that is currently legal. I was thinking that this was in a separate generic, but obviously it isn't. Therefore, we have to put the Nonblocking spec on each individual subprogram declared for the parallel interface. (Unlike the others, we can get the Nonblocking right here, as there is no compatibility problem. Most implementations will require Nonblocking => True, but that's a good thing!) --- Modify AARM 5.5.2(8.a/5) and move into the Dynamic Semantics section after 5.5.2 I don't know why you added this move; the note is a Ramification of the way the loop parameters are declared (which determines their lifetime); while the consequences are dynamic semantics, the reason that they occur is because of the static semantics. That can't change just because of the parallel keyword, if you believe that is not true then we need wording to actually make this requirement. I left the note where is was, pending further justification. --- Technical issues (not handled but very important): You failed to reconcile the AI with the extensive changes to add chunk specifications. The syntax and semantics in 5.5 were changed substantially, and you need to hook your stuff to that. In particular, you need to explain that Max_Chunks comes from the value evaluated for the chunk_specification if one is present, and that it is implementation-defined otherwise. (That definition of Max_Chunks). And if there is a chunk_parameter, you have to define how that maps to the chunks returned from this interface. You probably can steal some of the revised wording -- that's all in the latest RM draft. (If you don't have it yet, you ought to be able to download it soon.) That was too much for me to do tonight, so I'm leaving it for someone else. ****************************************************************