Version 1.3 of ai05s/ai05-0139-1.txt

Unformatted version of ai05s/ai05-0139-1.txt version 1.3
Other versions for file ai05s/ai05-0139-1.txt

!standard 5.5          09-02-13 AI05-0139-1/01
!class Amendment 09-02-13
!status work item 09-02-13
!status received 09-01-19
!priority Medium
!difficulty Medium
!subject User-defined iterators
!summary
(See proposal.)
!problem
Using an iterator over one of the Ada 2005 containers libraries requires writing a subprogram which gets called repeatedly with each element of the container. That effectively requires writing the loop body out-of-line from the loop, making the code inside-out of its normal flow, requiring the writing of a lot of extra text, and thus making it much harder to read and understand.
A better mechanism is required.
!proposal
Define a magic generic unit that defines a pair of interfaces:
generic type Cursor is private; No_Element : in Cursor; package Ada.Iterator_Interfaces is
type Basic_Iterator is limited interface; function First (Object : Basic_Iterator) return Cursor; function Next (Object : Basic_Iterator; Position : Cursor) return Cursor;
type Reversible_Iterator is new Basic_Iterator with null record; function Last (Object : Reversible_Iterator) return Cursor; function Previous (Object : Reversible_Iterator; Position : Cursor) return Cursor;
end Ada.Iterator_Interfaces;
Note that the routines defined in the magic generic generally match those used in iteration for the Ada.Containers packages, and their meaning is the same. The only difference is that we've added an iterator object parameter to the Next and Previous functions (which is necessary to make them part of the interface).
We then add a new kind of loop parameter specification:
defining_identifier in [reverse] iterator_expression
The rules for this are:
* The iterator_expression is expected to be of an (instance of Iterator_Interfaces).Basic_Iterator'Class [the parens here mean that we're talking about (any) type declared in an instance of Iterator_Interfaces];
* The type of the defining_identifier is the appropriate Cursor type for the instance of Iterator_Interfaces;
* the keyword "reverse" can only appear if the type is of an (instance of Iterator_Interfaces).Reversible_Iterator'Class;
* The iterator_expression is a build-in-place context;
* the master of the iterator_expression is the (entire) loop; the iterator_expression continues to exist until the loop is completed (normally or abnormally).
The loop executes by calling First (or Last if the keyword "reverse" is present), followed by calling Next (or Previous if the keyword "reverse" is present) for each following iteration until the function called returns No_Element. The loop body is executed once for each cursor value (other than No_Element) obtained from the functions with the loop parameter set to that cursor value.
!wording
** TBD **
!discussion
This is based on the earlier proposal and discussion recorded in AC-0112.TXT. This proposal was modified to address the comments made on that one, and has been simplified considerably.
Note that the iterator interface can be directly applied to a container type (allowing it to be directly iterated) or it can be part of a separate function (which would allow adding additional parameters to the iterator and/or adding additional storage for the iterators). By using interfaces, we give maximum flexibility to the programmer that is using these iterators.
If the creator of the iterator needs extra storage to accomplish the iteration, they can include it in the type of the iterator.
If the creator of the iterator needs to get control when the loop is exited prematurely, they can define a controlled iterator type. In that case, (presuming that the iterator_expression was a function call) when the loop is exited, the Finalization routine for the iterator object will be called.
We provide a form for reverse iterators. Reverse iteration makes sense for many kinds of containers (vectors and lists, for instance). Given that the syntax already exists and every Ada programmer is familiar with it, not providing it would quickly become one of the most common frustrations of programming in Ada. That seems silly.
We could have simplified the proposal by requiring that all iterators support Last and Previous, but iterators where reverse iteration does not make sense would have to raise an exception when Last is called. The proposed features, in contrast, make a compile-time determination as to whether reverse is allowed. That will detect errors earlier; it generally is better to detect errors at compile-time if possible.
Issues:
This proposal requires some solution to the instantiation for private types problem (see the many alternatives for AI05-0074) in order to be useful in the Ada.Containers packages. As this is a "magic" generic (really to provide a signature), we could relax the rules for it (only), but that is pretty distasteful.
This proposal requires some solution to the modify-in-place problem for the containers packages (see earlier proposals). Without that, modifying an element in an iterator would require writing an extra subprogram, meaning the new syntax would not have gained anything.
The rule about an iterator_expression being a build-in-place context is intended to allow iterators to get control on premature loop exit. If we didn't have that, it would not be possible to prevent reuse of iterator objects and thus the finalization of the objects would not necessarily occur when the loop is exited. This is somewhat of a hack - there is no intrinsic reason why an iterator object should not be allowed to be reused. Note, however, that reusable iterator types can be defined: they just have to be non-limited (and presumably not controlled). So perhaps we can justify this limitation in the service of additional control.
The use of a controlled type for an iterator in this proposal would require bounded containers to be preelaborable units, rather than pure units. We add a prohibition on the actual container from being controlled, so that finalization adverse users could still use the bounded containers (but obviously not the iterators).
Alternatives considered:
One option considered was including the name of the cursor type somewhere in the syntax. Something like:
defining_identifier : subtype_mark in [reverse] iterator_expression
That was not done mainly because differs from the existing syntax for no clear benefit. It would make the type of the identifier clearer, but it also would mean that switching from a discrete iterator to a container one would require some syntax changes that don't really fit into the model. (One solution to that problem would be to optionally add the type name to the discrete for loop as well, but that seems like a bunch of additional changes for little value.)
It should be noted that with this formulation, the existing discrete iterators could be considered just a predefined instantiation of this magic generic for discrete types. The analogy breaks down a bit for ranges (they don't look like a function call), but it isn't hard to imagine such a unification.
A more important consideration was allowing elements of a container to be directly iterated on. That is, the type of the loop parameter could be the element of a container (that is, anything at all) rather than the cursor to the item.
That would require the element type to be included in the generic somewhere, as well as an accessor function.
But the important consideration would be how the iterated element actually is handled. The programmer may only want read-only access to the elements, or they may want to change a copy of the element, or they my need to change the element in place. It is difficult to make these decisions for the programmer, and providing a large family of iterators quickly gets complicated.
Given all of this, and that existing Ada iteration is on read-only index values (which are can be considered a restricted form of cursors), we only provide iterator on cursors, which are treated as constants within the loop. This way the programmer can pick the most appropriate way to access the actual element: via Element, Update_Element, or (hopefully) some form of Modifiable_Element.
!examples
This feature could easily be used in the existing Ada.Containers packages. We give a suggested implementation for Ada.Containers.Doubly_Linked_Lists; the other containers could be similar.
Somewhere in the visible part of Ada.Containers.Doubly_Linked_Lists we would add an instantiation of Iterator_Interfaces:
package List_Iterators is Ada.Iterator_Interfaces (Cursor, No_Element);
And a function Iterate:
function Iterate (Container : in List) return List_Iterators.Reversible_Iterator'Class;
In the private part of Ada.Containers.Doubly_Linked_Lists, we'd define a controlled type:
type Our_List_Iterator is new Ada.Controlled.Limited_Controlled and List_Iterators.Reversible_Iterator with record Our_List : access List; end record; function First (Object : Our_List_Iterator) return Cursor; function Next (Object : Our_List_Iterator; Position : Cursor) return Cursor; function Last (Object : Our_List_Iterator) return Cursor; function Previous (Object : Our_List_Iterator; Position : Cursor) return Cursor; procedure Finalize (Object : in out Our_List_Iterator);
The implementation would be something like:
function Iterate (Container : in List) return List_Iterators.Reversible_Iterator'Class is begin return Iter:Our_List_Iterator do Iter.Our_List := Container'Unchecked_Access; -- This is safe unless someone does an Unchecked_Deallocation on -- the container in the loop; we don't try to protect against that -- in other cases for Ada.Containers, so we won't here, either. -- Set a tampering context for Iter.Our_List.all. end return; end Iterate;
function First (Object : Our_List_Iterator) return Cursor is begin return Object.Our_List.First; end First;
function Next (Object : Our_List_Iterator; Position : Cursor) return Cursor is begin return Next (Position); end Next;
-- Similar implementations for Last and Previous. procedure Finalize (Object : in out Our_List_Iterator) is begin -- Clear the tampering context for Object.Our_List. end Finalize;
Note that we didn't add these interfaces to the actual container types. That was because we wanted these iterators to be a tampering context (thus ensuring that they terminate and do not lead to erroneous execution). This is important from an analysis perspective: proving that a loop terminates is equivalent to proving that it is equivalent to a for loop. For this reason, we don't want for loops that don't terminate (at least logically; if the container is very large, the loop might not terminate for a long time).
We could also define a partial list iterator:
function Partial_Iterate (Container : in List; Start : in Cursor; Finish : in Cursor := No_Element) return List_Iterators.Reversible_Iterator'Class;
where Start and Finish (if not No_Element) would have to be cursors for Container. In this case, the iteration would run from Start to Finish (or No_Element, if Finish is not on the list after/before Start) forward or in reverse. (This would also be a tampering context as above.)
We would use Partial_Iterator directly in a loop:
for C in Partial_Iterate (My_List, Start, Finish) loop ... Element (C) ... end loop;
!ACATS test
!appendix

From: Randy Brukardt
Date: Monday, January 19, 2009  11:31 PM

Here's the third of my trial balloons. For those following the recent
comp.lang.ada discussion, this hopefully explains why I said that defining
proper syntactic iterators was "trivial". It least I think this is a very simple
proposal that really has no reason not to be adopted other than the needed
prerequisites.

[This is version /01 of the AI - ED.]

****************************************************************

From: Randy Brukardt
Date: Tuesday, January 20, 2009  12:27 PM

There's a couple of points that I left out of this one yesterday:

---

The use of a controlled type for an iterator would require bounded containers to
be preelaborable units, rather than pure units. We'd still have a prohibition on
the actual container from being controlled, so that finalization adverse users
could still use the bounded containers (but obviously not the iterators).

---

I had said "This proposal requires some solution to the modify-in-place problem
for the containers packages." That was true in my original draft, which tried
put the iterator interface directly on the container type. Since I gave up that
model for Ada.Containers (in order to support tampering), it would be possible
to hide an unsafe accessor in the private part and allow it to be used only in
an element iterator (which, being a tampering context, is safe from most bad
operations). So an access-to-element alternative could in fact be accomplished
safely and practically without having to make unsafe operations available. Doing
so would definitely require a solution to the instantiation on partial types
problem (the current solution could be put into a child package although that
would make the usage unnecessarily messy).

****************************************************************

From: Randy Brukardt
Date: Saturday, February 28, 2009  9:10 PM

At the recent meeting, we had a rather forceful discussion about "safe" vs.
"unsafe" iterators. I said that I would not stand for unsafe iterators on the
predefined containers. (The existing containers have safe iterators as they do
not allow "tampering with cursors" and require checks to prevent that.)

The discussion about what "safe" meant concentrated on the fact that a "safe"
iterator has guaranteed termination. That probably led some of you to think I
was overstating the case for "safe" iterators. Unfortunately, I was too busy
taking notes to be able to come up with the other part of safe iterators: they
can't become erroneous as long as the container object exists.

The following loop (in the absence of a tampering check) is erroneous:

    for C in My_List loop
       if <Complex-expression-that-is-True-for-First(My_List)> then
          Delete_First (My_List);
       end if;
    end loop;

The problem is that the cursor C has become invalid (A.18.3(153-156/2)); the
iterator will then do an Next operation on it, which is erroneous by
A.18.3(157/2). This is really hard to see in this formulation (which is why it
is so dangerous), so lets look at the loop written explicitly:

    declare
       C : Cursor := First (My_List);
    begin
       while C /= No_Element loop
          if <Complex-expression-that-is-True-for-First(My_List)> then
              Delete_First (My_List);
          end if;
          C := Next (C); -- (!)
       end loop;
    end;

Here we can see the use of C after it becomes dangling, but even here it isn't
obvious that it is dangling.

Recall that one could also cause this problem using Delete, Splice, and various
other operations. (Many of those would require making a copy of the cursor,
which hopefully would be a red flag -- but not necessarily.) And the bad
operation could have been called in a subprogram called from three levels of
calls started from this loop - it is not necessarily right in the loop.

Since this form of loop will be the most common in practice (no one is going to
write an old passive iterator if they have this syntax available), it should be
reasonably safe and should not become erroneous easily. Thus a tampering check
(as in the existing procedure Iterate) is needed.

Two caveats to this discussion: (1) It is possible for the container in an
iterator to be destroyed (usually via Unchecked_Deallocation) while an iterator
is running. That is clearly erroneous by existing language rules, and there is
nothing practical that can be done to prevent it. So even a "safe" iterator is
not completely safe. But it seems unusual to be destroying a container that way;
deleting an element is a much more likely operation.

(2) Another way to prevent this particular problem would be to require the use
of "invalid" cursors that reference a container that still exists to raise
Program_Error rather than be erroneous. That, however, is a distributed overhead
that may not be appropriate to require on all containers. Most existing
containers implementations do some form of such a check, but it is not clear
that any cheap scheme for such a check would be sufficient to implement a
language requirement. That's because a cheap check would use some sort of serial
number scheme, and it would always be possible to do a set of operations that
would cause the check to fail to be made. (It would be pretty unlikely that that
would happen.) Requiring such checks would have the effect of making all
containers usage safer (such as the user-written loop in the second example),
and would reduce the iterators issues solely to termination (for which I still
would disagree about avoiding checks but would not be required to bang my shoes
on the desk and stalk out... ;-)

Note that we could only make this requirement in the case covered by
A.18.3(156/2); the other cases have to remain erroneous. So we'd probably have
to define a new cursor state (say "dangling") which was guaranteed to raise
Program_Error.

****************************************************************

From: Matthew Heaney
Date: Tuesday, March  3, 2009  5:48 PM

> Since this form of loop will be the most common in practice (no one is
> going to write an old passive iterator if they have this syntax
> available), it should be reasonably safe and should not become
> erroneous easily. Thus a tampering check (as in the existing procedure
> Iterate) is needed.

I don't know whether this is relevant, but in GNAT this error is detected:

with Ada.Containers.Doubly_Linked_Lists;

procedure Test_Iter is
    package Lists is new Ada.Containers.Doubly_Linked_Lists (Positive);
    use Lists;

    L : List;
    C : Cursor;

begin
    L.Append (1);
    L.Append (2);
    L.Append (3);
    C := L.First;

    while C /= No_Element loop
       if Element (C) = 1 then
          L.Delete_First;
       end if;
       C := Next (C);
    end loop;
end Test_Iter;


$ gnatmake -gnata -gnat05 test_iter
gcc -c -gnata -gnat05 test_iter.adb
gnatbind -x test_iter.ali
gnatlink test_iter.ali

$ ./test_iter
raised SYSTEM.ASSERTIONS.ASSERT_FAILURE : bad cursor in Next

You have to compile with assertions enabled to turn on the detection, but
otherwise it just works, without requiring any extra tampering checks.

In any event, I don't know how you'd implement a tampering check when you're
performing an active iteration.  There is no "stack" as you have when performing
a passive iteration.

****************************************************************

From: Randy Brukardt
Date: Tuesday, March  3, 2009  6:03 PM

> I don't know whether this is relevant, but in GNAT this error is
> detected:
...

Well, one option obviously would be require the detection of this error. But
then there is a distributed overhead (at least I presume so, since you allow
turning the check off).

...
> In any event, I don't know how you'd implement a tampering check when
> you're performing an active iteration.  There is no "stack" as you
> have when performing a passive iteration.

True, but a syntactical "for" loop is much more like a passive iterator than an
active one. The mechanism of iteration is hidden in it (like a passive iterator
but unlike an active one). So the user of it has no reason to know what
operations will cause it to fail to work. (Yes, they could "lift the hood" and
poke around at how the implementation is defined, but surely we want to people
to be able to use the abstraction without doing that.)

My net takeway is that we want "for loop" syntax to work as much like the
passive iterator as possible (with the added bonus that exits are much easier
with the "for loop" syntax). After all, no sane person will ever call the
passive iterator again once the "for loop" syntax is available. Getting much
less safe is going in the wrong direction.

****************************************************************

From: Matthew Heaney
Date: Tuesday, March  3, 2009  6:37 PM

> Well, one option obviously would be require the detection of this
> error. But then there is a distributed overhead (at least I presume
> so, since you allow turning the check off.

There's a function we call anytime a cursor is passed as a parameter (well,
called when assertions are enabled).  If there's a problem the condition is
usually detectable right away (e.g. when deallocating a node, we set the prev
and next pointer to the value of the node being deleted, and later test for this
case).

> True, but a syntactical "for" loop is much more like a passive
> iterator than an active one. The mechanism of iteration is hidden in
> it (like a passive iterator but unlike an active one). So the user of
> it has no reason to know what operations will cause it to fail to
> work. (Yes, they could "lift the hood" and poke around at how the
> implementation is defined, but surely we want to people to be able to
> use the abstraction without doing that.)

If you're talking about adding a first-class language construct for iterating
over containers, then yes, manipulating the tampering bits would work.  (I
assume that the mechanism would give the implementor some kind of hook a la
Controlled.Initialize to indicate that iteration has started.)

The VB(script) runtime works this way.  The container object has a distinguished
_NewEnum method that is called when the loop "elaborates", returning an object
that is sort of a context for the iteration, and the script run-time uses this
to maintain iterator state:

   Dim objFiles, objFile;

   Set objFiles = objFolder.Files

   For Each objFile In objFiles
      'do something
      If objFile.Predicate Then
         Exit For
      End If
   Next

The iteration object ("enumerator") is automatically destroyed when the loop
terminates.

****************************************************************

From: Bob Duff
Date: Tuesday, March  3, 2009  6:34 PM

> At the recent meeting, we had a rather forceful discussion about "safe" vs.
> "unsafe" iterators. I said that I would not stand for unsafe iterators
> on the predefined containers. (The existing containers have safe
> iterators as they do not allow "tampering with cursors" and require
> checks to prevent
> that.)
>
> The discussion about what "safe" meant concentrated on the fact that a
> "safe" iterator has guaranteed termination.

I presume you mean that they should terminate for the predefined containers. And
allow me to create iterators that don't terminate, right?  I mean, infinite data
structures can be useful.

I think I agree with your other comments, on tampering and so forth.

****************************************************************

From: Randy Brukardt
Date: Wednesday, March  4, 2009  1:39 PM

...
> > The discussion about what "safe" meant concentrated on the fact that
> > a "safe" iterator has guaranteed termination.
>
> I presume you mean that they should terminate for the predefined containers.

Correct, I was only talking about the predefined containers.

> And allow me to create iterators that don't terminate, right?
>  I mean, infinite data structures can be useful.

There is nothing that the language could do to guarantee termination.
Personally, I find the use of the keyword "for" with something that does not
terminate abhorrent (see below), but this is clearly no different than defining
"*" to do a divide. So you would be able to do what you want.

One of the great features of Ada is that you can't modify the for loop
parameter. Thus, when you see the keyword "for", you know automatically that the
loop terminates without any further analysis. Indeed, I was taught that the only
practical way to prove loop termination is to show that the loop is equivalent
to a for loop -- and that is a technique that I use (informally) periodically.
(Since 30 years have gone by since I learned that, perhaps some better technique
has been invented.) That technique would not be very useful if for loops
themselves were not guaranteed to terminate.

While it's clear that we can't *force* iterators to be written so they always
terminate (that would require proving the absence of bugs in the definition of
an iterator, which is surely beyond what Ada can do - maybe SPARK could do it,
but it would have to learn about exceptions to do so), surely we want anything
that is language-defined to have that property. And personally, I'll consider
any non-terminating iterators to be broken, but YMMV.

****************************************************************

From: Robert I. Eachus
Date: Wednesday, March  4, 2009  9:30 PM

>Indeed, I was taught that the only practical way to prove loop termination is
>to show that the loop is equivalent to a for loop -- and that is a technique
>that I use (informally) periodically. (Since 30 years have gone by since I
>learned that, perhaps some better technique has been invented.) That technique
>would not be very useful if for loops themselves were not guaranteed to
>terminate.

Off topic, but:I thought it might be of interest to more than Randy:

Yes, there are better methods. The generalization of the for loop approach is to
associate the problem space with a finite set.  If the algorithm visits another
member of the set after a  finite step, and never revisits a set member, the
program will always terminate.(There are a lot of cases where allowing the final
node to be visited twice simplifies the proof, but that is a detail.)

For a trivial example, say you want to find the shortest route through a graph
containing n nodes so that each node is visited once..  The problem space is the
n factorial permutations of n.  Any algorithm that visits members of the problem
space only once, and either terminates there or moves to a new member within a
finite time, always terminates. This proof works for the brute force algorithm
that visits all permutations, and also works for much more complex (and faster)
algorithms that eliminate entire sets of permutations at each step. (At least
for graphs with non-negative distances.) If a solution was allowed to pass
through some nodes more than once, I guess you could generalize to visiting each
point in the solution set at most n times, but an alternative is to renumber the
nodes.  Try adding a count of untraversed adjoining edges to the nodes to get a
larger problem set.

There is another way to prove termination, that became somewhat popular around
1980 with respect to linear programming and other optimization techniques.
Unfortunately for actual computer codes, instead of pure math, it is often very
painful.  Associate a real (not integer or floating-point) variable with the
loop, and find a definition such that both that the value of variable has a
limit, and that each iteration of the loop monotonically  increases (or
decreases) the variable.  Now if the termination condition requires only that
the variable reach a value close to* the limit, the loop terminates.

Many geometric methods to solve linear programming problems use some variation
on this to show termination.  More practically, some actually use it!  Prior to
starting the program, you have no clue as to what the final, correct answer is.
But you do know from the initial conditions that you can specify a volume such
that any volume that small contains a single potential solution. (To put in LP
terms, when the volume is that small the set of slack variables can be easily
read off.)  So you have two loops, one which reduces the volume known to contain
the solution at a rate with a polynomial bound in the parameters, and a second
short loop in the number of inequalities which reads out the solution. ;-)

*Yes, I know I am being mathematically imprecise here. Any real proof will
define both the behavior of the variable near the limit, and the distance
between the termination condition and the actual limit in terms of some common
epsilon.  Obviously, using floating point of some sort rather than real numbers
is where the pain comes in.  The distance between the termination condition and
the actual limit has to be  large relative to the numerical accuracy of the
representation, and no two steps in the loop can  be closer together than about
twice the epsilon of the representation. So a proof using real numbers is just a
starting point.

****************************************************************


Questions? Ask the ACAA Technical Agent