Version 1.1 of ai05s/ai05-0139-2.txt
!standard 5.5 09-12-08 AI05-0139-2/01
!class Amendment 09-02-13
!status work item 09-02-13
!status received 09-01-19
!priority Medium
!difficulty Medium
!subject Syntactic sugar for accessors, containers, and iterators
!summary
(See proposal.)
!problem
Using an iterator over one of the Ada 2005 containers libraries requires writing
a subprogram which gets called repeatedly with each element of the container.
That effectively requires writing the loop body out-of-line from the loop,
making the code inside-out of its normal flow, requiring the writing of a lot of
extra text, and thus making it much harder to read and understand.
A better mechanism is required.
In addition, using the notion of "accessors" or "references" as defined in AI05-0142
is syntactically awkward. Finally, it is often clearer to iterate over the elements
of a container (or array) directly, rather than iterating over a cursor (or index)
into the container (or array).
Using accessors for user-defined dereference (rather than simple access values
as proposed in AI05-141) has the potential to provide a safer mechanism that
might be used for persistence, relocating garbage collection, detection of
dangling references, etc. However, using accessors would be syntactically
painful compared to a simple ".all."
In all these cases, a modest amount of "syntactic sugar" might make the use
of accessors, containers, and iterators more natural and easier to read.
!proposal
Define a set of "magic" generic units which are linked to
some appropriate syntactic sugar for accessors, containers, and iterators:
generic package Iterator_Interfaces defines a pair of interfaces:
generic
type Cursor is private;
No_Element : in Cursor;
package Ada.Iterator_Interfaces is
type Basic_Iterator is limited interface;
function First (Object : Basic_Iterator) return Cursor;
function Next (Object : Basic_Iterator; Position : Cursor) return Cursor;
type Reversible_Iterator is limited interface and Basic_Iterator;
function Last (Object : Reversible_Iterator) return Cursor;
function Previous (Object : Reversible_Iterator; Position : Cursor) return Cursor;
end Ada.Iterator_Interfaces;
Note that the routines defined in the magic generic generally match those used
in iteration for the Ada.Containers packages, and their meaning is the same. The
only difference is that we've added an iterator object parameter to the Next and
Previous functions (which is necessary to make them part of the interface).
The interface Basic_Iterator defines a user-defined iterator for the forward direction;
the interface Reversible_Iterator defines a user-defined iterator for both the forward and
reverse direction.
The function First is expected to deliver the first cursor value for the iterator iterating
in the forward direction, or return No_Element if no cursor value is available. The function
Next is expected to deliver the next cursor value for the iterator iterating
in the forward direction given the previous cursor value returned from a call to First or
Next, or return No_Element if no additional cursor values are available.
The function Last is expected to deliver the last cursor value for the iterator iterating
in the reverse direction, or return No_Element if no cursor value is available. The function
Previous is expected to deliver the next cursor value for the iterator iterating
in the reverse direction given the previous cursor value returned from a call to Last or
Previous, or return No_Element if no additional cursor values are available.
The generic package Element_References defines two tagged types which can be
used for accessing elements of a container or similar structure:
generic
type Element_Type(<>) is limited private;
package Ada.Element_References is
type Element_Reference(Data : access Element_Type) is
new Ada.Finalization.Limited_Controlled with private;
type Element_Constant_Reference(Data : access constant Element_Type) is
new Ada.Finalization.Limited_Controlled with private;
end Ada.Element_References;
A function that returns a result of a type descended from
Element_Reference or Element_Constant_Reference has ".Data.all"
implicitly applied to its result at the point of call, if necessary to
satisfy overload resolution. This allows the convenient use of accessor
functions both on the left-hand-side and right-hand-side of an
assignment, and in other contexts. For example:
function Lookup(Tab : Hash_Table; Key : String) return Element_Reference;
Lookup(T1, "hello") := Lookup(T1, "goodbye");
This last line is equivalent to:
Lookup(T1, "hello").Data.all := Lookup(T1, "goodbye").Data.all;
Our final "magic" generic package, Container_Interfaces, defines an
interface to be used in conjunction with the Reversible_Iterator defined
in Interator_Interfaces, and the element reference types defined in
Element_References, as the basis for a container that takes full
advantage of our new syntactic sugar (which is further defined below):
generic
with package Iterators is new Ada.Iterator_Interfaces(<>);
type Iterator_Type is new Iterators.Reversible_Iterator with private;
with package References is new Ada.Element_References(<>);
type Element_Reference(Data : access References.Element_Type) is
new Element_References.Element_Reference with private;
type Element_Constant_Reference(Data : access constant Element_Type) is
new Element_References.Element_Constant_Reference with private;
package Ada.Container_Interfaces is
type Container is limited interface;
function Default_Iterator(C : Container) return Iterator_Type is abstract;
function Element_Ref(C : aliased in out Container; Cursor : Iterators.Cursor)
return Element_Reference is abstract;
function Element_CRef(C : aliased in Container; Cursor : Iterators.Cursor)
return Element_Constant_Reference is abstract;
end Ada.Container_Interfaces;
Indexed_component syntax may be used with any object of a type descended
from the Container interface, as though it were a single-dimensional array
indexed by the Cursor type. It is equivalent to a call on Element_Ref in any
context that requires a variable; otherwise it is equivalent to a call on
Element_CRef. For example:
V : Int_Vectors.Vector;
C1, C2 : Int_Vectors.Cursor;
...
C1 := ...
C2 := ...
V(C1) := V(C2);
The last line is equivalent to:
Element_Ref(V, C1).Data.all := Element_CRef(V, C2).Data.all;
These three generic packages would typically be instantiated as follows:
type Cursor_Type is ...
package Iterators is new Ada.Iterator_Interfaces(Cursor_Type, No_Element);
type Iterator_Type is new Iterators.Reversible_Iterator with ...
package References is new Ada.Element_References(Element_Type);
type Element_Reference is
new References.Element_Reference with ...
type Element_Constant_Reference is
new References.Element_Constant_Reference with ...
package Containers is new
Ada.Container_Interfaces(Iterators, Iterator_Type,
References, Element_Reference, Element_Constant_Reference);
type Container_Type is new Containers.Container with ...
NEW ITERATOR SYNTAX
Given these three generic packages and their associated types,
we then add two new kinds of "for" loop parameter specifications
which provide for iteration over containers (and arrays) in a natural way:
defining_identifier IN [REVERSE] iterator_expression
defining_identifier [: subtype_indication] OF container_expression
The first requires an expression returning an object of a type
descended from Basic_Iterator, or Reversible_Iterator if REVERSE appears.
The defining_identifier is of the corresponding Cursor type, and can
be used to index into the container.
The second requires an expression returning an array, or an object
of a type descended from the Container interface defined above.
An example of the first iterator would be:
V : Int_Vectors.Vector;
...
for C in V.Iterate loop
V(C) := V(C) + 1;
--
--
end loop;
Some examples of the second iterator would be:
Arr : array(1..10) of Natural := ...
...
for X : Natural of Arr loop
--
--
--
X := X + 1;
end loop;
--
Vec : Int_Vectors.Vector;
...
for X of Vec loop
--
--
--
X := X + 1;
end loop;
The "of" syntax, suggested by J.P. Rosen, seems to help identify X
as an element of the array/container rather than as an index or cursor
in the sequence defined by the discrete range or iterator.
USER DEFINED DEREFERENCE (AI05-141)
The generic package Element_References might also be relevant to the
notion of a user-defined "dereference" operation suggested in AI05-141.
One of the significant problems with AI05-141 as it stands is that there
is no operation called when the dereference is "completed." If we
presume that the purpose of a user-defined dereference might be to
implement some kind of software paging or persistence, then the page on
which the designated object resides needs to be "pinned" in memory until
the dereference is complete. But there is no user-defined cleanup
routine provided in AI05-141. The Element_Reference and
Element_Constant_Reference could provide for that, because they are
derived from Limited_Controlled.
So the suggestion is that, at least conceptually, there is an element
reference object created, where the access discriminant comes from the
result of the user-defined dereference, and when the resulting element
reference object goes away, a user-defined cleanup routine would be
called, perhaps to "unpin" the page on which the designated object
resides.
For practical reasons, rather than having the implementor of the storage
pool provide extensions of Element_Reference and
Element_Constant_Reference, they would instead just provide dereference
operations similar to what is suggested in AI05-141, which takes in an
"address" that might in fact not really be a machine address but rather
an address-sized "locator" for the object that was returned by Allocate,
and produces as a result a true machine address, suitable for use as the
access discriminant of an element reference object. There would be two
of these operations, one for dereferencing an access-to-variable value,
and one for dereferencing an access-to-constant value, and the work to
be done might be quite different in the two cases (presuming, for
instance, that the underlying storage pool is doing software paging).
In addition to the dereference operations, the implementor of the
storage pool would provide two "cleanup" routines, which would be called
when the hypothetical element reference object goes out of scope, again
one for access-to-variable and one for access-to-constant.
The import and export operations suggested in AI05-141 might still be
provided for conversion to/from longer-lived access values.
AI05-141 suggested that the dereference operation be accomplished
through an IN-OUT parameter of type System.Address, allowing it to be
the null procedure by default. Given the number of operations we
are now talking about, we would suggest defining a distinct
("magic") descendant of Root_Storage_Pool that had these extra
operations, two of which would more naturally be functions than procedures. The
cleanup operations would ideally take three parameters, the storage
pool, the original access value (which might not be a real machine
address), and the "real" machine address produced by the dereference
operation. Having both the original and the dereferenced value should
simplify the cleanup operation.
Here is a possible package System.Managed_Storage_Pools:
with System.Storage_Pools;
package System.Managed_Storage_Pools is
type Root_Managed_Storage_Pool is abstract new
Storage_Pools.Root_Storage_Pool with private;
--
--
function Dereference(Pool : access Root_Managed_Storage_Pool;
Access_Value : System.Address) return System.Address is abstract;
function Constant_Dereference(Pool : access Root_Managed_Storage_Pool;
Access_Constant_Value : System.Address) return System.Address is abstract;
procedure Cleanup_Dereference(Pool : access Root_Managed_Storage_Pool;
Access_Value : System.Address; Address_Value : System.Address) is null;
procedure Cleanup_Constant_Dereference(Pool : access Root_Managed_Storage_Pool;
Access_Value : System.Address; Address_Value : System.Address) is null;
function Export(Pool : access Root_Managed_Storage_Pool;
Access_Value : System.Address) return System.Address is abstract;
function Import(Pool : access Root_Managed_Storage_Pool;
Address_Value : System.Address) return System.Address is abstract;
private
--
end System.Managed_Storage_Pools;
Note that the use of element references is completely behind the scenes with
this approach. These access values work just like "regular" access values,
but behind the scenes:
* on each dereference or conversion to an access parameter or
access discriminant they get temporarily made available as "true"
addresses via the user-defined dereference routines, and then get
cleaned up.
* on conversion to an access type that is not for an access parameter
or access discriminant, if the storage pool of the new access type
is different, they get permanently converted into a "normal"
address value via the Export function (which might raise PROGRAM_ERROR).
* on conversion from an access type of a different storage pool, the
new access values are produced by calling the Import function, which might
raise PROGRAM_ERROR if importing from another storage pool is not
supported.
!wording [REMOVED FOR NOW]
!discussion [NOT UPDATED -- SEE PROPOSAL FOR SOME NEWER DISCUSSION]
This is based on the earlier proposal and discussion recorded in AC-0112.TXT.
This proposal was modified to address the comments made on that one, and has
been simplified considerably.
Note that the iterator interface can be directly applied to a container type
(allowing it to be directly iterated) or it can be part of a separate function
(which would allow adding additional parameters to the iterator and/or adding
additional storage for the iterators). By using interfaces, we give maximum
flexibility to the programmer that is using these iterators.
If the creator of the iterator needs extra storage to accomplish the iteration,
they can include it in the type of the iterator.
If the creator of the iterator needs to get control when the loop is exited
prematurely, they can define a controlled iterator type. In that case,
(presuming that the iterator_expression was a function call) when the loop is
exited, the Finalization routine for the iterator object will be called.
We provide a form for reverse iterators. Reverse iteration makes sense for many
kinds of containers (vectors and lists, for instance). Given that the syntax
already exists and every Ada programmer is familiar with it, not providing it
would quickly become one of the most common frustrations of programming in Ada.
That seems silly.
We could have simplified the proposal by requiring that all iterators support
Last and Previous, but iterators where reverse iteration does not make sense
would have to raise an exception when Last is called. The proposed features, in
contrast, make a compile-time determination as to whether reverse is allowed.
That will detect errors earlier; it generally is better to detect errors at
compile-time if possible.
It might appear that this proposal would require some solution to the
instantiation for private types problem, but as the examples below show, this
is not true.
Note that the actual iterator type can be either limited or non-limited. The
value of a limited iterator object is that an iterator implementation can
assume (presuming that it only exports functions and not types) that the
iterator object is created and destroyed with the loop. This allows safe
cleanup operations when the loop is exited (as in the examples below).
If that capability is not needed, an iterator implementation allow reuse of
iterator objects (especially if no temporary storage is needed).
Note that if temporary storage or loop exit actions are needed, the iterator
object must be separate from the container object that it is iterating
on. Otherwise nested loops on the same container would not work.
for I in My_Container.Iterator loop
for J in My_Container.Iterator loop
...
end loop;
end loop;
The I and J loops need separate temporary storage, which must be provided by
the iterator object (the loop can only provide the loop parameter cursor object).
Issues:
This proposal requires some solution to the modify-in-place problem for the
containers packages (see the family of AI05-0142 proposals). Without that,
modifying an element in an iterator would require writing an extra subprogram,
meaning the new syntax would not have gained anything.
The use of a controlled type for an iterator in this proposal would require
bounded containers to be preelaborable units, rather than pure units. We add a
prohibition on the actual container from being controlled, so that finalization
adverse users could still use the bounded containers (but obviously not the
iterators). This seems OK, given that the current accessor proposal (AI05-0142-4)
has similar requirements.
It would be nice for an iterator implementation to be able to prevent the
creation of iterator objects not associated with a loop. For instance, given
the implementation of the example below:
My_Iter : Some_Instance.Basic_Iterator'Class := My_List.Iterator;
would have the effect setting tampering on My_List and keeping it set until
until the object is destroyed (potentially a long time in the future). But this
is not the natural thing to do -- it requires a lot more typing than the intended
usage -- and no major definitional problems are encountered in this case. We
would need a lot more mechanism to prevent the creation of objects outside of
the loop context, so we do not try to prevent this misuse.
Alternatives considered:
One option considered was including the name of the cursor type somewhere in the
syntax. Something like:
defining_identifier : subtype_mark in [reverse] iterator_expression
That was not done mainly because differs from the existing syntax for no clear
benefit. It would make the type of the identifier clearer, but it also would
mean that switching from a discrete iterator to a container one would require
some syntax changes that don't really fit into the model. (One solution to that
problem would be to optionally add the type name to the discrete for loop as
well, but that seems like a bunch of additional changes for little value.)
It should be noted that with this formulation, the existing discrete iterators
could be considered just a predefined instantiation of this magic generic for
discrete types. The analogy breaks down a bit for ranges (they don't look like a
function call), but it isn't hard to imagine such a unification.
A more important consideration was allowing elements of a container to be
directly iterated on. That is, the type of the loop parameter could be the
element of a container (that is, anything at all) rather than the cursor to the
item.
That would require the element type to be included in the generic somewhere, as
well as an accessor function.
But the important consideration would be how the iterated element actually is
handled. The programmer may only want read-only access to the elements, or they
may want to change a copy of the element, or they my need to change the element
in place. It is difficult to make these decisions for the programmer, and
providing a large family of iterators quickly gets complicated.
Given all of this, and that existing Ada iteration is on read-only index values
(which are can be considered a restricted form of cursors), we only provide
iterator on cursors, which are treated as constants within the loop. This way
the programmer can pick the most appropriate way to access the actual element:
via Element, Update_Element, or (hopefully) some form of Modifiable_Element.
!examples
This feature could easily be used in the existing Ada.Containers packages. We
give a suggested implementation for Ada.Containers.Doubly_Linked_Lists; the
other containers could be similar.
The start of the visible part of Ada.Containers.Doubly_Linked_Lists would be
modified as follows:
with Ada.Iterator_Interfaces;
generic
type Element_Type is private;
with function "=" (Left, Right : Element_Type) return Boolean is <>;
package Ada.Containers.Doubly_Linked_Lists is
pragma Preelaborate(Doubly_Linked_Lists);
pragma Remote_Types(Doubly_Linked_Lists);
package Cursors is
type Cursor is private;
pragma Preelaborable_Initialization(Cursor);
No_Element : constant Cursor;
private
... --
end Cursors;
subtype Cursor is Cursors.Cursor;
No_Element : Cursor renames Cursors.No_Element;
package Iterators is new
Ada.Iterator_Interfaces (Cursors, No_Element);
type List is new Iterators with private;
pragma Preelaborable_Initialization(List);
Empty_List : constant List;
...
Note that the changes here are the addition of the nested package Cursors,
the addition of the instantiation, and the subtype and renames to eliminate the
effect of the nested package. The majority of the package is unchanged.
After the existing procedure Reverse_Iterate, we would add a new function:
function Iterate (Container : in List) return List_Iterators.Reversible_Iterator'Class;
This would be described as
Iterate returns an object that can be used in a loop_statement to loop
with a loop parameter designating each node in Container, starting with the
first node and moving the cursor as per the Next function if the reserved word
reverse is not used in the user_loop_parameter_statement, and starting
with the last node and moving the cursor as per the Previous function otherwise.
Program_Error is propagated if any operation (in particular, the sequence_of_statements
of the loop_statement whose iterator_expression denotes this object) tampers with
the cursors of Container while the returned object exists.
This could be implemented by defining a controlled type in the private part of
Ada.Containers.Doubly_Linked_Lists:
type Our_List_Iterator is new Ada.Controlled.Limited_Controlled and
List_Iterators.Reversible_Iterator with record
Our_List : access List;
end record;
function First (Object : Our_List_Iterator) return Cursor;
function Next (Object : Our_List_Iterator; Position : Cursor) return Cursor;
function Last (Object : Our_List_Iterator) return Cursor;
function Previous (Object : Our_List_Iterator; Position : Cursor) return Cursor;
procedure Finalize (Object : in out Our_List_Iterator);
The implementation of these operations would be something like:
function Iterate (Container : in List) return List_Iterators.Reversible_Iterator'Class is
begin
return Iter:Our_List_Iterator do
Iter.Our_List := Container'Unchecked_Access;
--
--
--
--
end return;
end Iterate;
function First (Object : Our_List_Iterator) return Cursor is
begin
return Object.Our_List.First;
end First;
function Next (Object : Our_List_Iterator; Position : Cursor) return Cursor is
begin
return Next (Position);
end Next;
--
procedure Finalize (Object : in out Our_List_Iterator) is
begin
--
end Finalize;
Note that we didn't add these interfaces to the actual container types. That was
because we wanted these iterators to be a tampering context (thus ensuring that
they terminate and do not lead to erroneous execution). Users would be very
surprised if a "safe" construct like a for loop caused erroneous execution
(which will happen if the node designated by the loop parameter is deleted).
The function Iterate would normally be used directly in a loop:
for C in My_List.Iterate loop
... Element (C) ...
end loop;
We could also define a partial list iterator:
function Partial_Iterate (Container : in List;
Start : in Cursor;
Finish : in Cursor := No_Element)
return List_Iterators.Reversible_Iterator'Class;
where Start and Finish (if not No_Element) would have to be cursors for
Container. In this case, the iteration would run from Start to Finish (or
No_Element, if Finish is not on the list after/before Start) forward or in
reverse. (This would also be a tampering context as above.)
We would use Partial_Iterator directly in a loop:
for C in Partial_Iterate (My_List, Start, Finish) loop
... Element (C) ...
end loop;
Some reviewers have said that they would prefer the iterator to work
directly on the container object. This would only eliminate a single
selected name, which does not seem worth the problems that would be
introduced. The example above means that an iterator is written:
for C in My_List.Iterate loop
Only the truly lazy would find a significant difference from:
for C in My_List loop
Moreover, we are planning (as of this writing) to use a similar
mechanism to provide safe accessors. That mechanism also requires
writing an extra selector; it would seem bizarre to work hard to eliminate
that here, but still handle use it for accessors.
Finally, a direct syntax would not allow nested loops (as different
termination code is needed for each loop) unless significant additional
magic is defined. Essentially, something like function Iterate would
have to be defined internally to the interface to provide the needed
temporary space/termination object. This would require far more
mechanism than the current proposal.
About the only thing going for the direct syntax is that it would be a
bit easier to understand. However, usage of the current proposal is so
easy that once a programmer sees an example, they will immediately
understand how it works. Understanding the details would rarely be needed.
!ACATS test
!appendix
From: Tucker Taft
Sent: Tuesday, December 8, 2009 5:01 PM
Here is a new variant for AI05-139 on user-defined iterators, etc. [This is
version /01 of this AI - ED.] It is based on all the earlier proposals,
and incorporates the notes on iterators I passed around at the last ARG meeting.
I removed the wording, and didn't update the discussion, so you will just
have to focus on the !proposal section, which includes some simple examples.
****************************************************************
Questions? Ask the ACAA Technical Agent