Version 1.1 of ai05s/ai05-0139-2.txt

Unformatted version of ai05s/ai05-0139-2.txt version 1.1
Other versions for file ai05s/ai05-0139-2.txt

!standard 5.5          09-12-08 AI05-0139-2/01
!class Amendment 09-02-13
!status work item 09-02-13
!status received 09-01-19
!priority Medium
!difficulty Medium
!subject Syntactic sugar for accessors, containers, and iterators
(See proposal.)
Using an iterator over one of the Ada 2005 containers libraries requires writing a subprogram which gets called repeatedly with each element of the container. That effectively requires writing the loop body out-of-line from the loop, making the code inside-out of its normal flow, requiring the writing of a lot of extra text, and thus making it much harder to read and understand.
A better mechanism is required.
In addition, using the notion of "accessors" or "references" as defined in AI05-0142 is syntactically awkward. Finally, it is often clearer to iterate over the elements of a container (or array) directly, rather than iterating over a cursor (or index) into the container (or array).
Using accessors for user-defined dereference (rather than simple access values as proposed in AI05-141) has the potential to provide a safer mechanism that might be used for persistence, relocating garbage collection, detection of dangling references, etc. However, using accessors would be syntactically painful compared to a simple ".all."
In all these cases, a modest amount of "syntactic sugar" might make the use of accessors, containers, and iterators more natural and easier to read.
Define a set of "magic" generic units which are linked to some appropriate syntactic sugar for accessors, containers, and iterators:
generic package Iterator_Interfaces defines a pair of interfaces:
generic type Cursor is private; No_Element : in Cursor; package Ada.Iterator_Interfaces is
type Basic_Iterator is limited interface; function First (Object : Basic_Iterator) return Cursor; function Next (Object : Basic_Iterator; Position : Cursor) return Cursor;
type Reversible_Iterator is limited interface and Basic_Iterator; function Last (Object : Reversible_Iterator) return Cursor; function Previous (Object : Reversible_Iterator; Position : Cursor) return Cursor;
end Ada.Iterator_Interfaces;
Note that the routines defined in the magic generic generally match those used in iteration for the Ada.Containers packages, and their meaning is the same. The only difference is that we've added an iterator object parameter to the Next and Previous functions (which is necessary to make them part of the interface).
The interface Basic_Iterator defines a user-defined iterator for the forward direction; the interface Reversible_Iterator defines a user-defined iterator for both the forward and reverse direction.
The function First is expected to deliver the first cursor value for the iterator iterating in the forward direction, or return No_Element if no cursor value is available. The function Next is expected to deliver the next cursor value for the iterator iterating in the forward direction given the previous cursor value returned from a call to First or Next, or return No_Element if no additional cursor values are available.
The function Last is expected to deliver the last cursor value for the iterator iterating in the reverse direction, or return No_Element if no cursor value is available. The function Previous is expected to deliver the next cursor value for the iterator iterating in the reverse direction given the previous cursor value returned from a call to Last or Previous, or return No_Element if no additional cursor values are available.
The generic package Element_References defines two tagged types which can be used for accessing elements of a container or similar structure:
generic type Element_Type(<>) is limited private; package Ada.Element_References is type Element_Reference(Data : access Element_Type) is new Ada.Finalization.Limited_Controlled with private; type Element_Constant_Reference(Data : access constant Element_Type) is new Ada.Finalization.Limited_Controlled with private; end Ada.Element_References;
A function that returns a result of a type descended from Element_Reference or Element_Constant_Reference has ".Data.all" implicitly applied to its result at the point of call, if necessary to satisfy overload resolution. This allows the convenient use of accessor functions both on the left-hand-side and right-hand-side of an assignment, and in other contexts. For example:
function Lookup(Tab : Hash_Table; Key : String) return Element_Reference;
Lookup(T1, "hello") := Lookup(T1, "goodbye");
This last line is equivalent to:
Lookup(T1, "hello").Data.all := Lookup(T1, "goodbye").Data.all;
Our final "magic" generic package, Container_Interfaces, defines an interface to be used in conjunction with the Reversible_Iterator defined in Interator_Interfaces, and the element reference types defined in Element_References, as the basis for a container that takes full advantage of our new syntactic sugar (which is further defined below):
generic with package Iterators is new Ada.Iterator_Interfaces(<>); type Iterator_Type is new Iterators.Reversible_Iterator with private; with package References is new Ada.Element_References(<>); type Element_Reference(Data : access References.Element_Type) is new Element_References.Element_Reference with private; type Element_Constant_Reference(Data : access constant Element_Type) is new Element_References.Element_Constant_Reference with private; package Ada.Container_Interfaces is type Container is limited interface; function Default_Iterator(C : Container) return Iterator_Type is abstract; function Element_Ref(C : aliased in out Container; Cursor : Iterators.Cursor) return Element_Reference is abstract; function Element_CRef(C : aliased in Container; Cursor : Iterators.Cursor) return Element_Constant_Reference is abstract; end Ada.Container_Interfaces;
Indexed_component syntax may be used with any object of a type descended from the Container interface, as though it were a single-dimensional array indexed by the Cursor type. It is equivalent to a call on Element_Ref in any context that requires a variable; otherwise it is equivalent to a call on Element_CRef. For example:
V : Int_Vectors.Vector; C1, C2 : Int_Vectors.Cursor; ... C1 := ... C2 := ... V(C1) := V(C2);
The last line is equivalent to:
Element_Ref(V, C1).Data.all := Element_CRef(V, C2).Data.all;
These three generic packages would typically be instantiated as follows:
type Cursor_Type is ... package Iterators is new Ada.Iterator_Interfaces(Cursor_Type, No_Element); type Iterator_Type is new Iterators.Reversible_Iterator with ...
package References is new Ada.Element_References(Element_Type); type Element_Reference is new References.Element_Reference with ... type Element_Constant_Reference is new References.Element_Constant_Reference with ...
package Containers is new Ada.Container_Interfaces(Iterators, Iterator_Type, References, Element_Reference, Element_Constant_Reference); type Container_Type is new Containers.Container with ...
Given these three generic packages and their associated types, we then add two new kinds of "for" loop parameter specifications which provide for iteration over containers (and arrays) in a natural way:
defining_identifier IN [REVERSE] iterator_expression defining_identifier [: subtype_indication] OF container_expression
The first requires an expression returning an object of a type descended from Basic_Iterator, or Reversible_Iterator if REVERSE appears. The defining_identifier is of the corresponding Cursor type, and can be used to index into the container.
The second requires an expression returning an array, or an object of a type descended from the Container interface defined above.
An example of the first iterator would be:
V : Int_Vectors.Vector; ... for C in V.Iterate loop V(C) := V(C) + 1; -- Note that this is equivalent to: -- Element_Ref(V, C).Data.all := Element_CRef(V, C).Data.all + 1 end loop;
Some examples of the second iterator would be:
Arr : array(1..10) of Natural := ... ... for X : Natural of Arr loop -- The ": Natural" part is optional. -- X renames Arr(<index>) where <index> in Arr'Range. -- X is a variable if Arr is a variable.
X := X + 1; end loop;
Vec : Int_Vectors.Vector; ... for X of Vec loop -- X renames Vec(<cursor>) where <cursor> in Default_Iterator(Vec). -- and Vec(<cursor>) is equiv to Element_[C]Ref(Vec, <cursor>).Data.all. -- X is a variable if Vec is a variable.
X := X + 1; end loop;
The "of" syntax, suggested by J.P. Rosen, seems to help identify X as an element of the array/container rather than as an index or cursor in the sequence defined by the discrete range or iterator.
The generic package Element_References might also be relevant to the notion of a user-defined "dereference" operation suggested in AI05-141. One of the significant problems with AI05-141 as it stands is that there is no operation called when the dereference is "completed." If we presume that the purpose of a user-defined dereference might be to implement some kind of software paging or persistence, then the page on which the designated object resides needs to be "pinned" in memory until the dereference is complete. But there is no user-defined cleanup routine provided in AI05-141. The Element_Reference and Element_Constant_Reference could provide for that, because they are derived from Limited_Controlled.
So the suggestion is that, at least conceptually, there is an element reference object created, where the access discriminant comes from the result of the user-defined dereference, and when the resulting element reference object goes away, a user-defined cleanup routine would be called, perhaps to "unpin" the page on which the designated object resides.
For practical reasons, rather than having the implementor of the storage pool provide extensions of Element_Reference and Element_Constant_Reference, they would instead just provide dereference operations similar to what is suggested in AI05-141, which takes in an "address" that might in fact not really be a machine address but rather an address-sized "locator" for the object that was returned by Allocate, and produces as a result a true machine address, suitable for use as the access discriminant of an element reference object. There would be two of these operations, one for dereferencing an access-to-variable value, and one for dereferencing an access-to-constant value, and the work to be done might be quite different in the two cases (presuming, for instance, that the underlying storage pool is doing software paging). In addition to the dereference operations, the implementor of the storage pool would provide two "cleanup" routines, which would be called when the hypothetical element reference object goes out of scope, again one for access-to-variable and one for access-to-constant.
The import and export operations suggested in AI05-141 might still be provided for conversion to/from longer-lived access values.
AI05-141 suggested that the dereference operation be accomplished through an IN-OUT parameter of type System.Address, allowing it to be the null procedure by default. Given the number of operations we are now talking about, we would suggest defining a distinct ("magic") descendant of Root_Storage_Pool that had these extra operations, two of which would more naturally be functions than procedures. The cleanup operations would ideally take three parameters, the storage pool, the original access value (which might not be a real machine address), and the "real" machine address produced by the dereference operation. Having both the original and the dereferenced value should simplify the cleanup operation.
Here is a possible package System.Managed_Storage_Pools:
with System.Storage_Pools; package System.Managed_Storage_Pools is type Root_Managed_Storage_Pool is abstract new Storage_Pools.Root_Storage_Pool with private;
-- usual Allocate/Deallocate/Storage_Size routines, -- inherited as abstract
function Dereference(Pool : access Root_Managed_Storage_Pool; Access_Value : System.Address) return System.Address is abstract;
function Constant_Dereference(Pool : access Root_Managed_Storage_Pool; Access_Constant_Value : System.Address) return System.Address is abstract;
procedure Cleanup_Dereference(Pool : access Root_Managed_Storage_Pool; Access_Value : System.Address; Address_Value : System.Address) is null;
procedure Cleanup_Constant_Dereference(Pool : access Root_Managed_Storage_Pool; Access_Value : System.Address; Address_Value : System.Address) is null;
function Export(Pool : access Root_Managed_Storage_Pool; Access_Value : System.Address) return System.Address is abstract;
function Import(Pool : access Root_Managed_Storage_Pool; Address_Value : System.Address) return System.Address is abstract;
private -- not specified end System.Managed_Storage_Pools;
Note that the use of element references is completely behind the scenes with this approach. These access values work just like "regular" access values, but behind the scenes:
* on each dereference or conversion to an access parameter or access discriminant they get temporarily made available as "true" addresses via the user-defined dereference routines, and then get cleaned up.
* on conversion to an access type that is not for an access parameter or access discriminant, if the storage pool of the new access type is different, they get permanently converted into a "normal" address value via the Export function (which might raise PROGRAM_ERROR).
* on conversion from an access type of a different storage pool, the new access values are produced by calling the Import function, which might raise PROGRAM_ERROR if importing from another storage pool is not supported.
!wording [REMOVED FOR NOW]
This is based on the earlier proposal and discussion recorded in AC-0112.TXT. This proposal was modified to address the comments made on that one, and has been simplified considerably.
Note that the iterator interface can be directly applied to a container type (allowing it to be directly iterated) or it can be part of a separate function (which would allow adding additional parameters to the iterator and/or adding additional storage for the iterators). By using interfaces, we give maximum flexibility to the programmer that is using these iterators.
If the creator of the iterator needs extra storage to accomplish the iteration, they can include it in the type of the iterator.
If the creator of the iterator needs to get control when the loop is exited prematurely, they can define a controlled iterator type. In that case, (presuming that the iterator_expression was a function call) when the loop is exited, the Finalization routine for the iterator object will be called.
We provide a form for reverse iterators. Reverse iteration makes sense for many kinds of containers (vectors and lists, for instance). Given that the syntax already exists and every Ada programmer is familiar with it, not providing it would quickly become one of the most common frustrations of programming in Ada. That seems silly.
We could have simplified the proposal by requiring that all iterators support Last and Previous, but iterators where reverse iteration does not make sense would have to raise an exception when Last is called. The proposed features, in contrast, make a compile-time determination as to whether reverse is allowed. That will detect errors earlier; it generally is better to detect errors at compile-time if possible.
It might appear that this proposal would require some solution to the instantiation for private types problem, but as the examples below show, this is not true.
Note that the actual iterator type can be either limited or non-limited. The value of a limited iterator object is that an iterator implementation can assume (presuming that it only exports functions and not types) that the iterator object is created and destroyed with the loop. This allows safe cleanup operations when the loop is exited (as in the examples below). If that capability is not needed, an iterator implementation allow reuse of iterator objects (especially if no temporary storage is needed).
Note that if temporary storage or loop exit actions are needed, the iterator object must be separate from the container object that it is iterating on. Otherwise nested loops on the same container would not work.
for I in My_Container.Iterator loop for J in My_Container.Iterator loop ... end loop; end loop;
The I and J loops need separate temporary storage, which must be provided by the iterator object (the loop can only provide the loop parameter cursor object).
This proposal requires some solution to the modify-in-place problem for the containers packages (see the family of AI05-0142 proposals). Without that, modifying an element in an iterator would require writing an extra subprogram, meaning the new syntax would not have gained anything.
The use of a controlled type for an iterator in this proposal would require bounded containers to be preelaborable units, rather than pure units. We add a prohibition on the actual container from being controlled, so that finalization adverse users could still use the bounded containers (but obviously not the iterators). This seems OK, given that the current accessor proposal (AI05-0142-4) has similar requirements.
It would be nice for an iterator implementation to be able to prevent the creation of iterator objects not associated with a loop. For instance, given the implementation of the example below:
My_Iter : Some_Instance.Basic_Iterator'Class := My_List.Iterator;
would have the effect setting tampering on My_List and keeping it set until until the object is destroyed (potentially a long time in the future). But this is not the natural thing to do -- it requires a lot more typing than the intended usage -- and no major definitional problems are encountered in this case. We would need a lot more mechanism to prevent the creation of objects outside of the loop context, so we do not try to prevent this misuse.
Alternatives considered:
One option considered was including the name of the cursor type somewhere in the syntax. Something like:
defining_identifier : subtype_mark in [reverse] iterator_expression
That was not done mainly because differs from the existing syntax for no clear benefit. It would make the type of the identifier clearer, but it also would mean that switching from a discrete iterator to a container one would require some syntax changes that don't really fit into the model. (One solution to that problem would be to optionally add the type name to the discrete for loop as well, but that seems like a bunch of additional changes for little value.)
It should be noted that with this formulation, the existing discrete iterators could be considered just a predefined instantiation of this magic generic for discrete types. The analogy breaks down a bit for ranges (they don't look like a function call), but it isn't hard to imagine such a unification.
A more important consideration was allowing elements of a container to be directly iterated on. That is, the type of the loop parameter could be the element of a container (that is, anything at all) rather than the cursor to the item.
That would require the element type to be included in the generic somewhere, as well as an accessor function.
But the important consideration would be how the iterated element actually is handled. The programmer may only want read-only access to the elements, or they may want to change a copy of the element, or they my need to change the element in place. It is difficult to make these decisions for the programmer, and providing a large family of iterators quickly gets complicated.
Given all of this, and that existing Ada iteration is on read-only index values (which are can be considered a restricted form of cursors), we only provide iterator on cursors, which are treated as constants within the loop. This way the programmer can pick the most appropriate way to access the actual element: via Element, Update_Element, or (hopefully) some form of Modifiable_Element.
This feature could easily be used in the existing Ada.Containers packages. We give a suggested implementation for Ada.Containers.Doubly_Linked_Lists; the other containers could be similar.
The start of the visible part of Ada.Containers.Doubly_Linked_Lists would be modified as follows:
with Ada.Iterator_Interfaces; generic type Element_Type is private; with function "=" (Left, Right : Element_Type) return Boolean is <>; package Ada.Containers.Doubly_Linked_Lists is pragma Preelaborate(Doubly_Linked_Lists); pragma Remote_Types(Doubly_Linked_Lists);
package Cursors is type Cursor is private; pragma Preelaborable_Initialization(Cursor);
No_Element : constant Cursor; private ... -- not specified by the language end Cursors; subtype Cursor is Cursors.Cursor; No_Element : Cursor renames Cursors.No_Element;
package Iterators is new Ada.Iterator_Interfaces (Cursors, No_Element);
type List is new Iterators with private; pragma Preelaborable_Initialization(List);
Empty_List : constant List;
Note that the changes here are the addition of the nested package Cursors, the addition of the instantiation, and the subtype and renames to eliminate the effect of the nested package. The majority of the package is unchanged.
After the existing procedure Reverse_Iterate, we would add a new function:
function Iterate (Container : in List) return List_Iterators.Reversible_Iterator'Class;
This would be described as
Iterate returns an object that can be used in a loop_statement to loop with a loop parameter designating each node in Container, starting with the first node and moving the cursor as per the Next function if the reserved word reverse is not used in the user_loop_parameter_statement, and starting with the last node and moving the cursor as per the Previous function otherwise. Program_Error is propagated if any operation (in particular, the sequence_of_statements of the loop_statement whose iterator_expression denotes this object) tampers with the cursors of Container while the returned object exists.
This could be implemented by defining a controlled type in the private part of Ada.Containers.Doubly_Linked_Lists:
type Our_List_Iterator is new Ada.Controlled.Limited_Controlled and List_Iterators.Reversible_Iterator with record Our_List : access List; end record; function First (Object : Our_List_Iterator) return Cursor; function Next (Object : Our_List_Iterator; Position : Cursor) return Cursor; function Last (Object : Our_List_Iterator) return Cursor; function Previous (Object : Our_List_Iterator; Position : Cursor) return Cursor; procedure Finalize (Object : in out Our_List_Iterator);
The implementation of these operations would be something like:
function Iterate (Container : in List) return List_Iterators.Reversible_Iterator'Class is begin return Iter:Our_List_Iterator do Iter.Our_List := Container'Unchecked_Access; -- This is safe unless someone does an Unchecked_Deallocation on -- the container in the loop; we don't try to protect against that -- in other cases for Ada.Containers, so we won't here, either. -- Set a tampering context for Iter.Our_List.all. end return; end Iterate;
function First (Object : Our_List_Iterator) return Cursor is begin return Object.Our_List.First; end First;
function Next (Object : Our_List_Iterator; Position : Cursor) return Cursor is begin return Next (Position); end Next;
-- Similar implementations for Last and Previous.
procedure Finalize (Object : in out Our_List_Iterator) is begin -- Clear the tampering context for Object.Our_List. end Finalize;
Note that we didn't add these interfaces to the actual container types. That was because we wanted these iterators to be a tampering context (thus ensuring that they terminate and do not lead to erroneous execution). Users would be very surprised if a "safe" construct like a for loop caused erroneous execution (which will happen if the node designated by the loop parameter is deleted).
The function Iterate would normally be used directly in a loop:
for C in My_List.Iterate loop ... Element (C) ... end loop;
We could also define a partial list iterator:
function Partial_Iterate (Container : in List; Start : in Cursor; Finish : in Cursor := No_Element) return List_Iterators.Reversible_Iterator'Class;
where Start and Finish (if not No_Element) would have to be cursors for Container. In this case, the iteration would run from Start to Finish (or No_Element, if Finish is not on the list after/before Start) forward or in reverse. (This would also be a tampering context as above.)
We would use Partial_Iterator directly in a loop:
for C in Partial_Iterate (My_List, Start, Finish) loop ... Element (C) ... end loop;
Some reviewers have said that they would prefer the iterator to work directly on the container object. This would only eliminate a single selected name, which does not seem worth the problems that would be introduced. The example above means that an iterator is written:
for C in My_List.Iterate loop
Only the truly lazy would find a significant difference from:
for C in My_List loop
Moreover, we are planning (as of this writing) to use a similar mechanism to provide safe accessors. That mechanism also requires writing an extra selector; it would seem bizarre to work hard to eliminate that here, but still handle use it for accessors.
Finally, a direct syntax would not allow nested loops (as different termination code is needed for each loop) unless significant additional magic is defined. Essentially, something like function Iterate would have to be defined internally to the interface to provide the needed temporary space/termination object. This would require far more mechanism than the current proposal.
About the only thing going for the direct syntax is that it would be a bit easier to understand. However, usage of the current proposal is so easy that once a programmer sees an example, they will immediately understand how it works. Understanding the details would rarely be needed.
!ACATS test

From: Tucker Taft
Sent: Tuesday, December  8, 2009  5:01 PM

Here is a new variant for AI05-139 on user-defined iterators, etc.  [This is
version /01 of this AI - ED.] It is based on all the earlier proposals,
and incorporates the notes on iterators I passed around at the last ARG meeting.

I removed the wording, and didn't update the discussion, so you will just
have to focus on the !proposal section, which includes some simple examples.


Questions? Ask the ACAA Technical Agent