Version 1.7 of ai05s/ai05-0139-1.txt

Unformatted version of ai05s/ai05-0139-1.txt version 1.7
Other versions for file ai05s/ai05-0139-1.txt

!standard 5.5          09-06-08 AI05-0139-1/02
!class Amendment 09-02-13
!status No Action (9-0-0) 10-02-27
!status work item 09-02-13
!status received 09-01-19
!priority Medium
!difficulty Medium
!subject User-defined iterators
!summary
(See proposal.)
!problem
Using an iterator over one of the Ada 2005 containers libraries requires writing a subprogram which gets called repeatedly with each element of the container. That effectively requires writing the loop body out-of-line from the loop, making the code inside-out of its normal flow, requiring the writing of a lot of extra text, and thus making it much harder to read and understand.
A better mechanism is required.
!proposal
Define a magic generic unit that defines a pair of interfaces:
generic type Cursor is private; No_Element : in Cursor; package Ada.Iterator_Interfaces is
type Basic_Iterator is limited interface; function First (Object : Basic_Iterator) return Cursor; function Next (Object : Basic_Iterator; Position : Cursor) return Cursor;
type Reversible_Iterator is limited interface and Basic_Iterator; function Last (Object : Reversible_Iterator) return Cursor; function Previous (Object : Reversible_Iterator; Position : Cursor) return Cursor;
end Ada.Iterator_Interfaces;
Note that the routines defined in the magic generic generally match those used in iteration for the Ada.Containers packages, and their meaning is the same. The only difference is that we've added an iterator object parameter to the Next and Previous functions (which is necessary to make them part of the interface).
We then add a new kind of loop parameter specification:
defining_identifier in [reverse] iterator_expression
See the wording below for the detailed rules.
!wording
Replace 5.5(3) by:
iteration_scheme := while condition | for loop_parameter_specification | for user_loop_parameter_specification
[Author's note: We need a different kind of loop parameter syntax so that 5.5(6) doesn't get triggered for user-defined iterators, as that would give it the wrong sort of type; this also reduces forward references just to syntax, which is generally accepted in the Standard.]
Modify 5.5(9): For execution of a loop_statement with a for iteration_scheme {with a loop_parameter_specification}, the ...
[Author's note: More detail to eliminate confusion between the two kinds of for loop.]
5.5.1 The generic package Iterator_Interfaces
The generic package Iterator_Interfaces provide a template for the definition of user-defined iterators.
generic type Cursor is private; No_Element : in Cursor; package Ada.Iterator_Interfaces is
type Basic_Iterator is limited interface; function First (Object : Basic_Iterator) return Cursor; function Next (Object : Basic_Iterator; Position : Cursor) return Cursor;
type Reversible_Iterator is limited interface and Basic_Iterator; function Last (Object : Reversible_Iterator) return Cursor; function Previous (Object : Reversible_Iterator; Position : Cursor) return Cursor;
end Ada.Iterator_Interfaces;
The interface Basic_Iterator defines a user-defined iterator for the forward direction; the interface Reversible_Iterator defines a user-defined iterator for both the forward and reverse direction.
The function First is expected to deliver the first cursor value for the iterator iterating in the forward direction, or return No_Element if no cursor value is available. The function Next is expected to deliver the next cursor value for the iterator iterating in the forward direction given the previous cursor value returned from a call to First or Next, or return No_Element if no additional cursor values are available.
The function Last is expected to deliver the last cursor value for the iterator iterating in the reverse direction, or return No_Element if no cursor value is available. The function Previous is expected to deliver the next cursor value for the iterator iterating in the reverse direction given the previous cursor value returned from a call to Last or Previous, or return No_Element if no additional cursor values are available.
5.5.2 User-defined iteration
[Author's note: This is a separate clause, so that we don't need a forward reference to the generic package.]
Syntax
user_loop_parameter_specification ::= defining_identifier in [reverse] iterator_expression
Name Resolution Rules
The iterator_expression of a user_loop_parameter_specification is expected to be of any tagged type.
Legality Rules
The iterator_expression shall be of a type covered by exactly one type Basic_Iterator'Class of an instance of Iterator_Interfaces.
AARM Reason: It would be possible for a single type to inherit from multiple instances of Iterator_Interfaces. However, since the type of the loop parameter is determined by the instance of the type Basic_Iterator'Class, we can have only one such interface in order for the loop parameter to have an unambiguous type.
If the reserved word reverse is present, the iterator_expression shall be of a type covered by type Reversible_Iterator'Class of an instance of Iterator_Interfaces.
Static Semantics
A user_loop_parameter_specification declares a loop parameter, which is an object whose subtype is that of the actual type given for the instance of Iterator_Interfaces that contains the Basic_Iterator interface from which the type of the iterator_expression is descended.
[Author's note: Wording is as similar as possible to 5.5(6).]
Dynamic Semantics
For the execution of a loop_statement with a for iteration_scheme with a user_loop_parameter_specification, the iterator_expression is first evaluated; the result is treated as an implicit by-reference formal parameter I of the loop_statement.
[Author's note: I don't like this wording much, but it has the correct semantics. The intent is that this evaluation work just like the evaluation of a by-reference parameter - see below.]
AARM Ramification: If the iterator_expression is a function call, a temporary object will be created here. A statement is a master (see 7.6.1), so the master of this temporary object (if any) is the loop statement. That means that the temporary object is finalized when the loop is left (including by transfers of control), and designers of user-defined iterators can depend on this behavior.
The sequence_of_statements of statements is executed once for each value returned from the primitive operations of the type of the iterator_expression (or until the loop is left as the consequence of a transfer of control). Prior to each such iteration, a value is assigned to the loop parameter as follows:
* For the first iteration of a loop that does not include the reserved word *reverse*, the loop parameter is initialized from the result of a call of I.First;
* For following iterations of a loop that does not include the reserved word *reverse*, the loop parameter is initialized from the result of a call of I.Next passed the current value of the loop parameter;
* For the first iteration of a loop that includes the reserved word *reverse*, the loop parameter is initialized from the result of a call of I.Last;
* For following iterations of a loop that includes the reserved word *reverse*, the loop parameter is initialized from the result of a call of I.Previous passed the current value of the loop parameter.
If any of these function calls return the value No_Element (as specified in the appropriate instance of Iterator_Interfaces), the loop is completed and left without executing the sequence_of_statements. The loop_statement propagates any exception propagated by these function calls.
[Author's note: I did not provide any special case if the inherited operations are not implemented as suggested in 5.5.1. That means that an implementation must actually call the routines as described (in case they do something weird); only as-if optimizations are allowed. Since the functions can only return a cursor (including No_Element) or raise an exception, there is no need to have a special case for misbehaving functions.]
Add after 7.5(2.7/2):
* the iterator_expression of a user_loop_parameter_specification of a loop_statement (see 5.5.2);
** Containers updates TBD ** (see !examples)
!discussion
This is based on the earlier proposal and discussion recorded in AC-0112.TXT. This proposal was modified to address the comments made on that one, and has been simplified considerably.
Note that the iterator interface can be directly applied to a container type (allowing it to be directly iterated) or it can be part of a separate function (which would allow adding additional parameters to the iterator and/or adding additional storage for the iterators). By using interfaces, we give maximum flexibility to the programmer that is using these iterators.
If the creator of the iterator needs extra storage to accomplish the iteration, they can include it in the type of the iterator.
If the creator of the iterator needs to get control when the loop is exited prematurely, they can define a controlled iterator type. In that case, (presuming that the iterator_expression was a function call) when the loop is exited, the Finalization routine for the iterator object will be called.
We provide a form for reverse iterators. Reverse iteration makes sense for many kinds of containers (vectors and lists, for instance). Given that the syntax already exists and every Ada programmer is familiar with it, not providing it would quickly become one of the most common frustrations of programming in Ada. That seems silly.
We could have simplified the proposal by requiring that all iterators support Last and Previous, but iterators where reverse iteration does not make sense would have to raise an exception when Last is called. The proposed features, in contrast, make a compile-time determination as to whether reverse is allowed. That will detect errors earlier; it generally is better to detect errors at compile-time if possible.
It might appear that this proposal would require some solution to the instantiation for private types problem, but as the examples below show, this is not true.
Note that the actual iterator type can be either limited or non-limited. The value of a limited iterator object is that an iterator implementation can assume (presuming that it only exports functions and not types) that the iterator object is created and destroyed with the loop. This allows safe cleanup operations when the loop is exited (as in the examples below). If that capability is not needed, an iterator implementation allow reuse of iterator objects (especially if no temporary storage is needed).
Note that if temporary storage or loop exit actions are needed, the iterator object must be separate from the container object that it is iterating on. Otherwise nested loops on the same container would not work.
for I in My_Container.Iterator loop for J in My_Container.Iterator loop ... end loop; end loop;
The I and J loops need separate temporary storage, which must be provided by the iterator object (the loop can only provide the loop parameter cursor object).
Issues:
This proposal requires some solution to the modify-in-place problem for the containers packages (see the family of AI05-0142 proposals). Without that, modifying an element in an iterator would require writing an extra subprogram, meaning the new syntax would not have gained anything.
The use of a controlled type for an iterator in this proposal would require bounded containers to be preelaborable units, rather than pure units. We add a prohibition on the actual container from being controlled, so that finalization adverse users could still use the bounded containers (but obviously not the iterators). This seems OK, given that the current accessor proposal (AI05-0142-4) has similar requirements.
It would be nice for an iterator implementation to be able to prevent the creation of iterator objects not associated with a loop. For instance, given the implementation of the example below:
My_Iter : Some_Instance.Basic_Iterator'Class := My_List.Iterator;
would have the effect setting tampering on My_List and keeping it set until until the object is destroyed (potentially a long time in the future). But this is not the natural thing to do -- it requires a lot more typing than the intended usage -- and no major definitional problems are encountered in this case. We would need a lot more mechanism to prevent the creation of objects outside of the loop context, so we do not try to prevent this misuse.
Alternatives considered:
One option considered was including the name of the cursor type somewhere in the syntax. Something like:
defining_identifier : subtype_mark in [reverse] iterator_expression
That was not done mainly because differs from the existing syntax for no clear benefit. It would make the type of the identifier clearer, but it also would mean that switching from a discrete iterator to a container one would require some syntax changes that don't really fit into the model. (One solution to that problem would be to optionally add the type name to the discrete for loop as well, but that seems like a bunch of additional changes for little value.)
It should be noted that with this formulation, the existing discrete iterators could be considered just a predefined instantiation of this magic generic for discrete types. The analogy breaks down a bit for ranges (they don't look like a function call), but it isn't hard to imagine such a unification.
A more important consideration was allowing elements of a container to be directly iterated on. That is, the type of the loop parameter could be the element of a container (that is, anything at all) rather than the cursor to the item.
That would require the element type to be included in the generic somewhere, as well as an accessor function.
But the important consideration would be how the iterated element actually is handled. The programmer may only want read-only access to the elements, or they may want to change a copy of the element, or they my need to change the element in place. It is difficult to make these decisions for the programmer, and providing a large family of iterators quickly gets complicated.
Given all of this, and that existing Ada iteration is on read-only index values (which are can be considered a restricted form of cursors), we only provide iterator on cursors, which are treated as constants within the loop. This way the programmer can pick the most appropriate way to access the actual element: via Element, Update_Element, or (hopefully) some form of Modifiable_Element.
!examples
This feature could easily be used in the existing Ada.Containers packages. We give a suggested implementation for Ada.Containers.Doubly_Linked_Lists; the other containers could be similar.
The start of the visible part of Ada.Containers.Doubly_Linked_Lists would be modified as follows:
with Ada.Iterator_Interfaces; generic type Element_Type is private; with function "=" (Left, Right : Element_Type) return Boolean is <>; package Ada.Containers.Doubly_Linked_Lists is pragma Preelaborate(Doubly_Linked_Lists); pragma Remote_Types(Doubly_Linked_Lists);
package Cursors is type Cursor is private; pragma Preelaborable_Initialization(Cursor);
No_Element : constant Cursor; private ... -- not specified by the language end Cursors; subtype Cursor is Cursors.Cursor; No_Element : Cursor renames Cursors.No_Element;
package Iterators is new Ada.Iterator_Interfaces (Cursors, No_Element);
type List is new Iterators with private; pragma Preelaborable_Initialization(List);
Empty_List : constant List;
...
Note that the changes here are the addition of the nested package Cursors, the addition of the instantiation, and the subtype and renames to eliminate the effect of the nested package. The majority of the package is unchanged.
After the existing procedure Reverse_Iterate, we would add a new function:
function Iterate (Container : in List) return List_Iterators.Reversible_Iterator'Class;
This would be described as
Iterate returns an object that can be used in a loop_statement to loop with a loop parameter designating each node in Container, starting with the first node and moving the cursor as per the Next function if the reserved word reverse is not used in the user_loop_parameter_statement, and starting with the last node and moving the cursor as per the Previous function otherwise. Program_Error is propagated if any operation (in particular, the sequence_of_statements of the loop_statement whose iterator_expression denotes this object) tampers with the cursors of Container while the returned object exists.
This could be implemented by defining a controlled type in the private part of Ada.Containers.Doubly_Linked_Lists:
type Our_List_Iterator is new Ada.Controlled.Limited_Controlled and List_Iterators.Reversible_Iterator with record Our_List : access List; end record; function First (Object : Our_List_Iterator) return Cursor; function Next (Object : Our_List_Iterator; Position : Cursor) return Cursor; function Last (Object : Our_List_Iterator) return Cursor; function Previous (Object : Our_List_Iterator; Position : Cursor) return Cursor; procedure Finalize (Object : in out Our_List_Iterator);
The implementation of these operations would be something like:
function Iterate (Container : in List) return List_Iterators.Reversible_Iterator'Class is begin return Iter:Our_List_Iterator do Iter.Our_List := Container'Unchecked_Access; -- This is safe unless someone does an Unchecked_Deallocation on -- the container in the loop; we don't try to protect against that -- in other cases for Ada.Containers, so we won't here, either. -- Set a tampering context for Iter.Our_List.all. end return; end Iterate;
function First (Object : Our_List_Iterator) return Cursor is begin return Object.Our_List.First; end First;
function Next (Object : Our_List_Iterator; Position : Cursor) return Cursor is begin return Next (Position); end Next;
-- Similar implementations for Last and Previous.
procedure Finalize (Object : in out Our_List_Iterator) is begin -- Clear the tampering context for Object.Our_List. end Finalize;
Note that we didn't add these interfaces to the actual container types. That was because we wanted these iterators to be a tampering context (thus ensuring that they terminate and do not lead to erroneous execution). Users would be very surprised if a "safe" construct like a for loop caused erroneous execution (which will happen if the node designated by the loop parameter is deleted).
The function Iterate would normally be used directly in a loop:
for C in My_List.Iterate loop ... Element (C) ... end loop;
We could also define a partial list iterator:
function Partial_Iterate (Container : in List; Start : in Cursor; Finish : in Cursor := No_Element) return List_Iterators.Reversible_Iterator'Class;
where Start and Finish (if not No_Element) would have to be cursors for Container. In this case, the iteration would run from Start to Finish (or No_Element, if Finish is not on the list after/before Start) forward or in reverse. (This would also be a tampering context as above.)
We would use Partial_Iterator directly in a loop:
for C in Partial_Iterate (My_List, Start, Finish) loop ... Element (C) ... end loop;
Some reviewers have said that they would prefer the iterator to work directly on the container object. This would only eliminate a single selected name, which does not seem worth the problems that would be introduced. The example above means that an iterator is written:
for C in My_List.Iterate loop
Only the truly lazy would find a significant difference from:
for C in My_List loop
Moreover, we are planning (as of this writing) to use a similar mechanism to provide safe accessors. That mechanism also requires writing an extra selector; it would seem bizarre to work hard to eliminate that here, but still handle use it for accessors.
Finally, a direct syntax would not allow nested loops (as different termination code is needed for each loop) unless significant additional magic is defined. Essentially, something like function Iterate would have to be defined internally to the interface to provide the needed temporary space/termination object. This would require far more mechanism than the current proposal.
About the only thing going for the direct syntax is that it would be a bit easier to understand. However, usage of the current proposal is so easy that once a programmer sees an example, they will immediately understand how it works. Understanding the details would rarely be needed.
!ACATS test
!appendix

From: Randy Brukardt
Date: Monday, January 19, 2009  11:31 PM

Here's the third of my trial balloons. For those following the recent
comp.lang.ada discussion, this hopefully explains why I said that defining
proper syntactic iterators was "trivial". It least I think this is a very simple
proposal that really has no reason not to be adopted other than the needed
prerequisites.

[This is version /01 of the AI - ED.]

****************************************************************

From: Randy Brukardt
Date: Tuesday, January 20, 2009  12:27 PM

There's a couple of points that I left out of this one yesterday:

---

The use of a controlled type for an iterator would require bounded containers to
be preelaborable units, rather than pure units. We'd still have a prohibition on
the actual container from being controlled, so that finalization adverse users
could still use the bounded containers (but obviously not the iterators).

---

I had said "This proposal requires some solution to the modify-in-place problem
for the containers packages." That was true in my original draft, which tried
put the iterator interface directly on the container type. Since I gave up that
model for Ada.Containers (in order to support tampering), it would be possible
to hide an unsafe accessor in the private part and allow it to be used only in
an element iterator (which, being a tampering context, is safe from most bad
operations). So an access-to-element alternative could in fact be accomplished
safely and practically without having to make unsafe operations available. Doing
so would definitely require a solution to the instantiation on partial types
problem (the current solution could be put into a child package although that
would make the usage unnecessarily messy).

****************************************************************

From: Randy Brukardt
Date: Saturday, February 28, 2009  9:10 PM

At the recent meeting, we had a rather forceful discussion about "safe" vs.
"unsafe" iterators. I said that I would not stand for unsafe iterators on the
predefined containers. (The existing containers have safe iterators as they do
not allow "tampering with cursors" and require checks to prevent that.)

The discussion about what "safe" meant concentrated on the fact that a "safe"
iterator has guaranteed termination. That probably led some of you to think I
was overstating the case for "safe" iterators. Unfortunately, I was too busy
taking notes to be able to come up with the other part of safe iterators: they
can't become erroneous as long as the container object exists.

The following loop (in the absence of a tampering check) is erroneous:

    for C in My_List loop
       if <Complex-expression-that-is-True-for-First(My_List)> then
          Delete_First (My_List);
       end if;
    end loop;

The problem is that the cursor C has become invalid (A.18.3(153-156/2)); the
iterator will then do an Next operation on it, which is erroneous by
A.18.3(157/2). This is really hard to see in this formulation (which is why it
is so dangerous), so lets look at the loop written explicitly:

    declare
       C : Cursor := First (My_List);
    begin
       while C /= No_Element loop
          if <Complex-expression-that-is-True-for-First(My_List)> then
              Delete_First (My_List);
          end if;
          C := Next (C); -- (!)
       end loop;
    end;

Here we can see the use of C after it becomes dangling, but even here it isn't
obvious that it is dangling.

Recall that one could also cause this problem using Delete, Splice, and various
other operations. (Many of those would require making a copy of the cursor,
which hopefully would be a red flag -- but not necessarily.) And the bad
operation could have been called in a subprogram called from three levels of
calls started from this loop - it is not necessarily right in the loop.

Since this form of loop will be the most common in practice (no one is going to
write an old passive iterator if they have this syntax available), it should be
reasonably safe and should not become erroneous easily. Thus a tampering check
(as in the existing procedure Iterate) is needed.

Two caveats to this discussion: (1) It is possible for the container in an
iterator to be destroyed (usually via Unchecked_Deallocation) while an iterator
is running. That is clearly erroneous by existing language rules, and there is
nothing practical that can be done to prevent it. So even a "safe" iterator is
not completely safe. But it seems unusual to be destroying a container that way;
deleting an element is a much more likely operation.

(2) Another way to prevent this particular problem would be to require the use
of "invalid" cursors that reference a container that still exists to raise
Program_Error rather than be erroneous. That, however, is a distributed overhead
that may not be appropriate to require on all containers. Most existing
containers implementations do some form of such a check, but it is not clear
that any cheap scheme for such a check would be sufficient to implement a
language requirement. That's because a cheap check would use some sort of serial
number scheme, and it would always be possible to do a set of operations that
would cause the check to fail to be made. (It would be pretty unlikely that that
would happen.) Requiring such checks would have the effect of making all
containers usage safer (such as the user-written loop in the second example),
and would reduce the iterators issues solely to termination (for which I still
would disagree about avoiding checks but would not be required to bang my shoes
on the desk and stalk out... ;-)

Note that we could only make this requirement in the case covered by
A.18.3(156/2); the other cases have to remain erroneous. So we'd probably have
to define a new cursor state (say "dangling") which was guaranteed to raise
Program_Error.

****************************************************************

From: Matthew Heaney
Date: Tuesday, March  3, 2009  5:48 PM

> Since this form of loop will be the most common in practice (no one is
> going to write an old passive iterator if they have this syntax
> available), it should be reasonably safe and should not become
> erroneous easily. Thus a tampering check (as in the existing procedure
> Iterate) is needed.

I don't know whether this is relevant, but in GNAT this error is detected:

with Ada.Containers.Doubly_Linked_Lists;

procedure Test_Iter is
    package Lists is new Ada.Containers.Doubly_Linked_Lists (Positive);
    use Lists;

    L : List;
    C : Cursor;

begin
    L.Append (1);
    L.Append (2);
    L.Append (3);
    C := L.First;

    while C /= No_Element loop
       if Element (C) = 1 then
          L.Delete_First;
       end if;
       C := Next (C);
    end loop;
end Test_Iter;


$ gnatmake -gnata -gnat05 test_iter
gcc -c -gnata -gnat05 test_iter.adb
gnatbind -x test_iter.ali
gnatlink test_iter.ali

$ ./test_iter
raised SYSTEM.ASSERTIONS.ASSERT_FAILURE : bad cursor in Next

You have to compile with assertions enabled to turn on the detection, but
otherwise it just works, without requiring any extra tampering checks.

In any event, I don't know how you'd implement a tampering check when you're
performing an active iteration.  There is no "stack" as you have when performing
a passive iteration.

****************************************************************

From: Randy Brukardt
Date: Tuesday, March  3, 2009  6:03 PM

> I don't know whether this is relevant, but in GNAT this error is
> detected:
...

Well, one option obviously would be require the detection of this error. But
then there is a distributed overhead (at least I presume so, since you allow
turning the check off).

...
> In any event, I don't know how you'd implement a tampering check when
> you're performing an active iteration.  There is no "stack" as you
> have when performing a passive iteration.

True, but a syntactical "for" loop is much more like a passive iterator than an
active one. The mechanism of iteration is hidden in it (like a passive iterator
but unlike an active one). So the user of it has no reason to know what
operations will cause it to fail to work. (Yes, they could "lift the hood" and
poke around at how the implementation is defined, but surely we want to people
to be able to use the abstraction without doing that.)

My net takeway is that we want "for loop" syntax to work as much like the
passive iterator as possible (with the added bonus that exits are much easier
with the "for loop" syntax). After all, no sane person will ever call the
passive iterator again once the "for loop" syntax is available. Getting much
less safe is going in the wrong direction.

****************************************************************

From: Matthew Heaney
Date: Tuesday, March  3, 2009  6:37 PM

> Well, one option obviously would be require the detection of this
> error. But then there is a distributed overhead (at least I presume
> so, since you allow turning the check off.

There's a function we call anytime a cursor is passed as a parameter (well,
called when assertions are enabled).  If there's a problem the condition is
usually detectable right away (e.g. when deallocating a node, we set the prev
and next pointer to the value of the node being deleted, and later test for this
case).

> True, but a syntactical "for" loop is much more like a passive
> iterator than an active one. The mechanism of iteration is hidden in
> it (like a passive iterator but unlike an active one). So the user of
> it has no reason to know what operations will cause it to fail to
> work. (Yes, they could "lift the hood" and poke around at how the
> implementation is defined, but surely we want to people to be able to
> use the abstraction without doing that.)

If you're talking about adding a first-class language construct for iterating
over containers, then yes, manipulating the tampering bits would work.  (I
assume that the mechanism would give the implementor some kind of hook a la
Controlled.Initialize to indicate that iteration has started.)

The VB(script) runtime works this way.  The container object has a distinguished
_NewEnum method that is called when the loop "elaborates", returning an object
that is sort of a context for the iteration, and the script run-time uses this
to maintain iterator state:

   Dim objFiles, objFile;

   Set objFiles = objFolder.Files

   For Each objFile In objFiles
      'do something
      If objFile.Predicate Then
         Exit For
      End If
   Next

The iteration object ("enumerator") is automatically destroyed when the loop
terminates.

****************************************************************

From: Bob Duff
Date: Tuesday, March  3, 2009  6:34 PM

> At the recent meeting, we had a rather forceful discussion about "safe" vs.
> "unsafe" iterators. I said that I would not stand for unsafe iterators
> on the predefined containers. (The existing containers have safe
> iterators as they do not allow "tampering with cursors" and require
> checks to prevent
> that.)
>
> The discussion about what "safe" meant concentrated on the fact that a
> "safe" iterator has guaranteed termination.

I presume you mean that they should terminate for the predefined containers. And
allow me to create iterators that don't terminate, right?  I mean, infinite data
structures can be useful.

I think I agree with your other comments, on tampering and so forth.

****************************************************************

From: Randy Brukardt
Date: Wednesday, March  4, 2009  1:39 PM

...
> > The discussion about what "safe" meant concentrated on the fact that
> > a "safe" iterator has guaranteed termination.
>
> I presume you mean that they should terminate for the predefined containers.

Correct, I was only talking about the predefined containers.

> And allow me to create iterators that don't terminate, right?
>  I mean, infinite data structures can be useful.

There is nothing that the language could do to guarantee termination.
Personally, I find the use of the keyword "for" with something that does not
terminate abhorrent (see below), but this is clearly no different than defining
"*" to do a divide. So you would be able to do what you want.

One of the great features of Ada is that you can't modify the for loop
parameter. Thus, when you see the keyword "for", you know automatically that the
loop terminates without any further analysis. Indeed, I was taught that the only
practical way to prove loop termination is to show that the loop is equivalent
to a for loop -- and that is a technique that I use (informally) periodically.
(Since 30 years have gone by since I learned that, perhaps some better technique
has been invented.) That technique would not be very useful if for loops
themselves were not guaranteed to terminate.

While it's clear that we can't *force* iterators to be written so they always
terminate (that would require proving the absence of bugs in the definition of
an iterator, which is surely beyond what Ada can do - maybe SPARK could do it,
but it would have to learn about exceptions to do so), surely we want anything
that is language-defined to have that property. And personally, I'll consider
any non-terminating iterators to be broken, but YMMV.

****************************************************************

From: Robert I. Eachus
Date: Wednesday, March  4, 2009  9:30 PM

>Indeed, I was taught that the only practical way to prove loop termination is
>to show that the loop is equivalent to a for loop -- and that is a technique
>that I use (informally) periodically. (Since 30 years have gone by since I
>learned that, perhaps some better technique has been invented.) That technique
>would not be very useful if for loops themselves were not guaranteed to
>terminate.

Off topic, but:I thought it might be of interest to more than Randy:

Yes, there are better methods. The generalization of the for loop approach is to
associate the problem space with a finite set.  If the algorithm visits another
member of the set after a  finite step, and never revisits a set member, the
program will always terminate.(There are a lot of cases where allowing the final
node to be visited twice simplifies the proof, but that is a detail.)

For a trivial example, say you want to find the shortest route through a graph
containing n nodes so that each node is visited once..  The problem space is the
n factorial permutations of n.  Any algorithm that visits members of the problem
space only once, and either terminates there or moves to a new member within a
finite time, always terminates. This proof works for the brute force algorithm
that visits all permutations, and also works for much more complex (and faster)
algorithms that eliminate entire sets of permutations at each step. (At least
for graphs with non-negative distances.) If a solution was allowed to pass
through some nodes more than once, I guess you could generalize to visiting each
point in the solution set at most n times, but an alternative is to renumber the
nodes.  Try adding a count of untraversed adjoining edges to the nodes to get a
larger problem set.

There is another way to prove termination, that became somewhat popular around
1980 with respect to linear programming and other optimization techniques.
Unfortunately for actual computer codes, instead of pure math, it is often very
painful.  Associate a real (not integer or floating-point) variable with the
loop, and find a definition such that both that the value of variable has a
limit, and that each iteration of the loop monotonically  increases (or
decreases) the variable.  Now if the termination condition requires only that
the variable reach a value close to* the limit, the loop terminates.

Many geometric methods to solve linear programming problems use some variation
on this to show termination.  More practically, some actually use it!  Prior to
starting the program, you have no clue as to what the final, correct answer is.
But you do know from the initial conditions that you can specify a volume such
that any volume that small contains a single potential solution. (To put in LP
terms, when the volume is that small the set of slack variables can be easily
read off.)  So you have two loops, one which reduces the volume known to contain
the solution at a rate with a polynomial bound in the parameters, and a second
short loop in the number of inequalities which reads out the solution. ;-)

*Yes, I know I am being mathematically imprecise here. Any real proof will
define both the behavior of the variable near the limit, and the distance
between the termination condition and the actual limit in terms of some common
epsilon.  Obviously, using floating point of some sort rather than real numbers
is where the pain comes in.  The distance between the termination condition and
the actual limit has to be  large relative to the numerical accuracy of the
representation, and no two steps in the loop can  be closer together than about
twice the epsilon of the representation. So a proof using real numbers is just a
starting point.

****************************************************************

From: Randy Brukardt
Sent: Wednesday, May 27, 2009  10:06 PM

[The following is a partial message from another thread, see AI05-0135-1 for
the entire thread. - Editor.]

One example that I've been thinking about is the proposed generic to provide
iterators. If we do decide to define that "magic" generic, we will need to
rearrange the Ada.Containers packages a lot to use it. But that rearrangement
would seem to be incompatible in a number of ways, and limiting on
implementation strategies to boot.

To refresh, assume that we have a magic iterator generic (this is simplified for
this example, as are the following packages) and we have Steve's "integrated
packages":

   generic
       type Cursor is private;
       No_Element : in Cursor;
   package Ada.Iterator_Interfaces is
       type Basic_Iterator is limited interface;
       function First (Object : Basic_Iterator) return Cursor;
       function Next (Object : Basic_Iterator; Position : Cursor) return Cursor;
   end Ada.Iterator_Interfaces;

[Note that the exact form of this generic doesn't matter much in terms of what
rearrangements need to be done to the Ada.Containers packages.]

Now, we currently have:

   generic
      type Element_Type is private;
   package Ada.Containers.Doubly_Linked_Lists is
      type List is tagged private;
      type Cursor is private;

      Empty_List : constant List;
      No_Element : constant Cursor;

      function Is_Empty (Container : List) return Boolean;
      procedure Clear (Container : in out List);
      function Element (Position : Cursor)
         return Element_Type;
      ...
   private
      ... -- not specified by the language
   end Ada.Containers.Doubly_Linked_Lists;

In order to add Basic_Iterator visibly to type List, we'll have to rearrange
this into using at least one nested package. The simplest I can come up with is:

   with Ada.Iterator_Interfaces;
   generic
      type Element_Type is private;
   package Ada.Containers.Doubly_Linked_Lists is
      use package Cursors is
         type Cursor is private;
         No_Element : constant Cursor;
      private
         ... -- not specified by the language
      end Cursors;

      package Iterators is new
          Ada.Iterator_Interfaces (Cursors, No_Element);

      type List is new Iterators with private;

      Empty_List : constant List;
      function Is_Empty (Container : List) return Boolean;
      procedure Clear (Container : in out List);
      function Element (Position : Cursor)
         return Element_Type;
      ...
   private
      ... -- not specified by the language
   end Ada.Containers.Doubly_Linked_Lists;

Limited with incompatibilities don't matter in this case, as you can't get a
limited view of a generic unit.

Primitiveness incompatibilities are definitely in evidence, but I do have to
wonder if it matters. In particular, if you derive from type Cursor currently,
you will inherit function Element. But you will no longer inherit that routine
with this rearrangement. I don't know how significant that is (I would suggest
it is not very significant).

Naming incompatibilities don't seem to be a problem with this example. They
could be if we wanted to remove the "integrated" nested package (as someone
could have named it explicitly in a use clause or in an expanded name). That's
not likely to be an issue in this case.

Since there is only two entities (plus the inherited "=") in the nested
packages, we could just rename them and not need any special construct at all.
But we'd still have the incompatibilities. (I did not expect this result when I
started writing this message.)

But there still is a significant issue. While we are not specifying what goes
into the private parts, we do need to be able to write them. A cursor needs a
reference to a node and to the full container (the List in this example). The
node probably could be defined in the private part of package Cursors, but we
can't define a reference to List in the private part -- it hasn't been declared
yet! And it *can't* be declared yet, because it depends on the interface that we
haven't yet created.

One way to workaround this problem is to use a Taft-amendment type in the
private part of Cursors. But we can't complete that with the type List itself;
we'd have to use a derived type. And that is obnoxious, forcing many junk type
conversions. Even using class-wide types won’t get rid of those conversions,
because they would be toward the root of the tree.

Another possible workaround would be to clutter the spec with an incomplete type
List before the nested package. But that could not be completed with the private
extension List, so we would still end up with the same problem.

In conclusion, this rearrangement works, has some client incompatibility (but it
probably isn't important), but forces a major amount of junk type conversions in
the body of the package. Of course, this is only one example, and it would be
nice to look at additional examples in the future.

****************************************************************

From: Tucker Taft
Sent: Thursday, May 28, 2009  2:12 PM

> Another possible workaround would be to clutter the spec with an 
> incomplete type List before the nested package. But that could not be 
> completed with the private extension List, so we would still end up with the
> same problem.

This particular problem seems fixable.  The requirement that an incomplete type
declaration be completed with a full_type_declaration seems arbitrary.  There
seems no inherent reason that it couldn't be completed with a private type or
private extension declaration.

The incomplete types produced from a "limited with" allow that, so the mechanism
will be needed in Ada 2005 to handle such a completion anyway.

> In conclusion, this rearrangement works, has some client 
> incompatibility (but it probably isn't important), but forces a major 
> amount of junk type conversions in the body of the package. Of course, 
> this is only one example, and it would be nice to look at additional examples
> in the future.

Normally I would have expected that almost the entire package would be included
in an "integrated" package, and then this big nested package would be followed by
various generic instantiations.  But you have illustrated a case where that doesn't
work, because you want one of the types to be built on the result of a generic
instantiation.  I wonder how common that is.

How would you do it without this feature?  As you point out, the integrated package
is mostly just a short-hand for a bunch of renames, and in this case, there aren't
that many.  Hiding the nested package name automatically from use-visibility might
be a helpful advantage of the proposed feature ;-).

It does seem the ability to define an incomplete type that is completed by a partial
view would be helpful, independent of this proposal.

I wonder if your "magic iterator" generic has this problem in general, and perhaps
it needs to be *defining* a cursor type, rather than taking it as a parameter.  Or
is it the fact that we want the container to be usable as an iterator?  I think that
is probably a bad idea in general, as it is inherently problematic to have the container
carry some state about a particular iterator over the container.  I understand the
desire for a *syntax* approximating:

     for Element in Container loop
        ...
     end loop;

But I see no requirement that this means we somehow have an iterator built into the
Container itself.  Also, we would certainly want to allow multiple tasks to iterate
over a Container in parallel, I would think, or to allow nested iterators such as:

     for Left in Container loop
        for Right in Container loop
           Try_Pair(Left, Right);
        end loop;
     end loop;

I think the above *syntax* should always presume there is an anonymous iterator object
created, and that is what carries the state of the iteration.

In any case, I wonder whether it is the strange nature of this generic that is creating
the problem, or is there a common need to be able to define in a single package two
recursively-dependent types one of which is based on a generic instantiation of the other.
I suppose this is a very hard question to answer...

Thanks for the example!

****************************************************************

From: Randy Brukardt
Sent: Thursday, May 28, 2009  2:49 PM

...
> This particular problem seems fixable.  The requirement that an 
> incomplete type declaration be completed with a full_type_declaration 
> seems arbitrary.  There seems no inherent reason that it couldn't be 
> completed with a private type or private extension declaration.
> The incomplete types produced from a "limited with" allow that, so the 
> mechanism will be needed in Ada 2005 to handle such a completion 
> anyway.

Yes, I in fact wrote this up originally assuming that such completions were
allowed. But before making a fool out of myself, I looked up the actual rules
and changed this part accordingly.

It would still clutter the spec a bit with an extra type declaration that
doesn't really have anything to do with anything other than the implementation,
but not a major problem overall.

> > In conclusion, this rearrangement works, has some client 
> > incompatibility (but it probably isn't important), but forces a 
> > major amount of junk type conversions in the body of the package. Of 
> > course, this is only one example, and it would be nice to look at
> additional examples in the future.
> 
> Normally I would have expected that almost the entire package would be 
> included in an "integrated" package, and then this big nested package 
> would be followed by various generic instantiations.  But you have 
> illustrated a case where that doesn't work, because you want one of 
> the types to be built on the result of a generic instantiation.  I 
> wonder how common that is.

No idea. It seems to need interfaces to make much sense, so it would mainly
come up in new code.

> How would you do it without this feature?  As you point out, the 
> integrated package is mostly just a short-hand for a bunch of renames, 
> and in this case, there aren't that many.
> Hiding the nested package name automatically from use-visibility might 
> be a helpful advantage of the proposed feature ;-).

I was surprised to find out that it doesn't seem to matter much either way; the key
is to use the nested package. Making it "integrated" is just icing.

> It does seem the ability to define an incomplete type that is 
> completed by a partial view would be helpful, independent of this 
> proposal.

Yes, I think so.

> I wonder if your "magic iterator" generic has this problem in general, 
> and perhaps it needs to be *defining* a cursor type, rather than 
> taking it as a parameter.

I don't know how that could work, given that container cursors include a reference
to the container object. I suppose the entire thing could be created as a mix-in generic
(adding to the container directly), but that seems annoying (isn't that exactly the sort
of thing that interfaces were created to avoid?).

> Or is it the fact that
> we want the container to be usable as an iterator?  I think that is 
> probably a bad idea in general, as it is inherently problematic to 
> have the container carry some state about a particular iterator over 
> the container.

I strongly agree with this, but you need to convince Ed. :-) Maybe the rest of
your message will help.

> I understand the desire for a *syntax* approximating:
> 
>      for Element in Container loop
>         ...
>      end loop;

I had proposed
    for Element in Container.Iterator loop
       ...
    end loop;
(where "Iterator" is a function returning an anonymous iterator object).
Since we're using that same sort of model for accessors (there is an "extra"
discriminant reference in those), I don't see the problem. But of course, I
was always happy with this formulation, it is other people that were not.

> But I see no requirement that this means we somehow have an iterator 
> built into the Container itself.  Also, we would certainly want to 
> allow multiple tasks to iterate over a Container in parallel, I would 
> think, or to allow nested iterators such as:
> 
>      for Left in Container loop
>         for Right in Container loop
>            Try_Pair(Left, Right);
>         end loop;
>      end loop;
> 
> I think the above *syntax* should always presume there is an anonymous 
> iterator object created, and that is what carries the state of the 
> iteration.

With the accessor proposal, you have to write a single extra selection; that
seems fine to me for iterators, too. Since it would not be surprising to use
both of them together, it would actually be kinda weird to have simpler syntax
for the iterator and not the accessor:

    for Cursor in Container.Iterator loop
        Container.Accessor(Cursor).Element.Component := 10;
    end loop;

> In any case, I wonder whether it is the strange nature of this generic 
> that is creating the problem, or is there a common need to be able to 
> define in a single package two recursively-dependent types one of 
> which is based on a generic instantiation of the other.  I suppose 
> this is a very hard question to answer...

I would expect a similar problem any time that you wanted to define an interface
that needs customization with some properties. But it is hard to tell for sure, other
than to say that designers are clever and come up with structures that no one expects.

****************************************************************

From: Bob Duff
Sent: Wednesday, June 17, 2009  4:17 PM

> I understand the desire
> for a *syntax* approximating:
> 
>      for Element in Container loop
>         ...
>      end loop;
> 
> But I see no requirement that this means we somehow have an iterator 
> built into the Container itself.

I strongly agree.  Eiffel went this way (state of iterator in the container) early on,
and it's a disaster.

A "for Element in Container" kind of syntax should not imply cursor-state-in-container
any more than "for I in 1..10" implies that any Integer contains a "current Integer"!

>...Also, we would certainly want to allow  multiple tasks to iterate 
>over a Container in parallel, I would think,  or to allow nested 
>iterators such as:
> 
>      for Left in Container loop
>         for Right in Container loop
>            Try_Pair(Left, Right);
>         end loop;
>      end loop;
> 
> I think the above *syntax* should always presume there is an anonymous 
> iterator object created, and that is what carries the state of the 
> iteration.

Yes.

> In any case, I wonder whether it is the strange nature of this generic 
> that is creating the problem, or is there a common need to be able to 
> define in a single package two recursively-dependent types one of 
> which is based on a generic instantiation of the other.  I suppose 
> this is a very hard question to answer...
> 
> Thanks for the example!

Yes.  It's an interesting and useful example.

---

> I had proposed
>     for Element in Container.Iterator loop
>        ...
>     end loop;
> (where "Iterator" is a function returning an anonymous iterator object).

That's OK, but "for Element in Container" could be some sort of short-hand for
"the default iterator".  E.g. the one that goes forward, and you need to say
something extra to get the backward one.

> Since we're using that same sort of model for accessors (there is an "extra"
> discriminant reference in those), I don't see the problem. But of 
> course, I was always happy with this formulation, it is other people that were not.
...
> With the accessor proposal, you have to write a single extra 
> selection; that seems fine to me for iterators, too. Since it would 
> not be surprising to use both of them together, it would actually be 
> kinda weird to have simpler syntax for the iterator and not the accessor:
> 
>     for Cursor in Container.Iterator loop
>         Container.Accessor(Cursor).Element.Component := 10;
>     end loop;

Hmm...

****************************************************************

Questions? Ask the ACAA Technical Agent