CVS difference for ai05s/ai05-0139-1.txt

Differences between 1.2 and version 1.3
Log of other versions for file ai05s/ai05-0139-1.txt

--- ai05s/ai05-0139-1.txt	2009/02/15 07:58:28	1.2
+++ ai05s/ai05-0139-1.txt	2009/03/15 03:51:06	1.3
@@ -306,3 +306,334 @@
 
 ****************************************************************
 
+From: Randy Brukardt
+Date: Saturday, February 28, 2009  9:10 PM
+
+At the recent meeting, we had a rather forceful discussion about "safe" vs.
+"unsafe" iterators. I said that I would not stand for unsafe iterators on the
+predefined containers. (The existing containers have safe iterators as they do
+not allow "tampering with cursors" and require checks to prevent that.)
+
+The discussion about what "safe" meant concentrated on the fact that a "safe"
+iterator has guaranteed termination. That probably led some of you to think I
+was overstating the case for "safe" iterators. Unfortunately, I was too busy
+taking notes to be able to come up with the other part of safe iterators: they
+can't become erroneous as long as the container object exists.
+
+The following loop (in the absence of a tampering check) is erroneous:
+
+    for C in My_List loop
+       if <Complex-expression-that-is-True-for-First(My_List)> then
+          Delete_First (My_List);
+       end if;
+    end loop;
+
+The problem is that the cursor C has become invalid (A.18.3(153-156/2)); the
+iterator will then do an Next operation on it, which is erroneous by
+A.18.3(157/2). This is really hard to see in this formulation (which is why it
+is so dangerous), so lets look at the loop written explicitly:
+
+    declare
+       C : Cursor := First (My_List);
+    begin
+       while C /= No_Element loop
+          if <Complex-expression-that-is-True-for-First(My_List)> then
+              Delete_First (My_List);
+          end if;
+          C := Next (C); -- (!)
+       end loop;
+    end;
+
+Here we can see the use of C after it becomes dangling, but even here it isn't
+obvious that it is dangling.
+
+Recall that one could also cause this problem using Delete, Splice, and various
+other operations. (Many of those would require making a copy of the cursor,
+which hopefully would be a red flag -- but not necessarily.) And the bad
+operation could have been called in a subprogram called from three levels of
+calls started from this loop - it is not necessarily right in the loop.
+
+Since this form of loop will be the most common in practice (no one is going to
+write an old passive iterator if they have this syntax available), it should be
+reasonably safe and should not become erroneous easily. Thus a tampering check
+(as in the existing procedure Iterate) is needed.
+
+Two caveats to this discussion: (1) It is possible for the container in an
+iterator to be destroyed (usually via Unchecked_Deallocation) while an iterator
+is running. That is clearly erroneous by existing language rules, and there is
+nothing practical that can be done to prevent it. So even a "safe" iterator is
+not completely safe. But it seems unusual to be destroying a container that way;
+deleting an element is a much more likely operation.
+
+(2) Another way to prevent this particular problem would be to require the use
+of "invalid" cursors that reference a container that still exists to raise
+Program_Error rather than be erroneous. That, however, is a distributed overhead
+that may not be appropriate to require on all containers. Most existing
+containers implementations do some form of such a check, but it is not clear
+that any cheap scheme for such a check would be sufficient to implement a
+language requirement. That's because a cheap check would use some sort of serial
+number scheme, and it would always be possible to do a set of operations that
+would cause the check to fail to be made. (It would be pretty unlikely that that
+would happen.) Requiring such checks would have the effect of making all
+containers usage safer (such as the user-written loop in the second example),
+and would reduce the iterators issues solely to termination (for which I still
+would disagree about avoiding checks but would not be required to bang my shoes
+on the desk and stalk out... ;-)
+
+Note that we could only make this requirement in the case covered by
+A.18.3(156/2); the other cases have to remain erroneous. So we'd probably have
+to define a new cursor state (say "dangling") which was guaranteed to raise
+Program_Error.
+
+****************************************************************
+
+From: Matthew Heaney
+Date: Tuesday, March  3, 2009  5:48 PM
+
+> Since this form of loop will be the most common in practice (no one is
+> going to write an old passive iterator if they have this syntax
+> available), it should be reasonably safe and should not become
+> erroneous easily. Thus a tampering check (as in the existing procedure
+> Iterate) is needed.
+
+I don't know whether this is relevant, but in GNAT this error is detected:
+
+with Ada.Containers.Doubly_Linked_Lists;
+
+procedure Test_Iter is
+    package Lists is new Ada.Containers.Doubly_Linked_Lists (Positive);
+    use Lists;
+
+    L : List;
+    C : Cursor;
+
+begin
+    L.Append (1);
+    L.Append (2);
+    L.Append (3);
+    C := L.First;
+
+    while C /= No_Element loop
+       if Element (C) = 1 then
+          L.Delete_First;
+       end if;
+       C := Next (C);
+    end loop;
+end Test_Iter;
+
+
+$ gnatmake -gnata -gnat05 test_iter
+gcc -c -gnata -gnat05 test_iter.adb
+gnatbind -x test_iter.ali
+gnatlink test_iter.ali
+
+$ ./test_iter
+raised SYSTEM.ASSERTIONS.ASSERT_FAILURE : bad cursor in Next
+
+You have to compile with assertions enabled to turn on the detection, but
+otherwise it just works, without requiring any extra tampering checks.
+
+In any event, I don't know how you'd implement a tampering check when you're
+performing an active iteration.  There is no "stack" as you have when performing
+a passive iteration.
+
+****************************************************************
+
+From: Randy Brukardt
+Date: Tuesday, March  3, 2009  6:03 PM
+
+> I don't know whether this is relevant, but in GNAT this error is
+> detected:
+...
+
+Well, one option obviously would be require the detection of this error. But
+then there is a distributed overhead (at least I presume so, since you allow
+turning the check off).
+
+...
+> In any event, I don't know how you'd implement a tampering check when
+> you're performing an active iteration.  There is no "stack" as you
+> have when performing a passive iteration.
+
+True, but a syntactical "for" loop is much more like a passive iterator than an
+active one. The mechanism of iteration is hidden in it (like a passive iterator
+but unlike an active one). So the user of it has no reason to know what
+operations will cause it to fail to work. (Yes, they could "lift the hood" and
+poke around at how the implementation is defined, but surely we want to people
+to be able to use the abstraction without doing that.)
+
+My net takeway is that we want "for loop" syntax to work as much like the
+passive iterator as possible (with the added bonus that exits are much easier
+with the "for loop" syntax). After all, no sane person will ever call the
+passive iterator again once the "for loop" syntax is available. Getting much
+less safe is going in the wrong direction.
+
+****************************************************************
+
+From: Matthew Heaney
+Date: Tuesday, March  3, 2009  6:37 PM
+
+> Well, one option obviously would be require the detection of this
+> error. But then there is a distributed overhead (at least I presume
+> so, since you allow turning the check off.
+
+There's a function we call anytime a cursor is passed as a parameter (well,
+called when assertions are enabled).  If there's a problem the condition is
+usually detectable right away (e.g. when deallocating a node, we set the prev
+and next pointer to the value of the node being deleted, and later test for this
+case).
+
+> True, but a syntactical "for" loop is much more like a passive
+> iterator than an active one. The mechanism of iteration is hidden in
+> it (like a passive iterator but unlike an active one). So the user of
+> it has no reason to know what operations will cause it to fail to
+> work. (Yes, they could "lift the hood" and poke around at how the
+> implementation is defined, but surely we want to people to be able to
+> use the abstraction without doing that.)
+
+If you're talking about adding a first-class language construct for iterating
+over containers, then yes, manipulating the tampering bits would work.  (I
+assume that the mechanism would give the implementor some kind of hook a la
+Controlled.Initialize to indicate that iteration has started.)
+
+The VB(script) runtime works this way.  The container object has a distinguished
+_NewEnum method that is called when the loop "elaborates", returning an object
+that is sort of a context for the iteration, and the script run-time uses this
+to maintain iterator state:
+
+   Dim objFiles, objFile;
+
+   Set objFiles = objFolder.Files
+
+   For Each objFile In objFiles
+      'do something
+      If objFile.Predicate Then
+         Exit For
+      End If
+   Next
+
+The iteration object ("enumerator") is automatically destroyed when the loop
+terminates.
+
+****************************************************************
+
+From: Bob Duff
+Date: Tuesday, March  3, 2009  6:34 PM
+
+> At the recent meeting, we had a rather forceful discussion about "safe" vs.
+> "unsafe" iterators. I said that I would not stand for unsafe iterators
+> on the predefined containers. (The existing containers have safe
+> iterators as they do not allow "tampering with cursors" and require
+> checks to prevent
+> that.)
+>
+> The discussion about what "safe" meant concentrated on the fact that a
+> "safe" iterator has guaranteed termination.
+
+I presume you mean that they should terminate for the predefined containers. And
+allow me to create iterators that don't terminate, right?  I mean, infinite data
+structures can be useful.
+
+I think I agree with your other comments, on tampering and so forth.
+
+****************************************************************
+
+From: Randy Brukardt
+Date: Wednesday, March  4, 2009  1:39 PM
+
+...
+> > The discussion about what "safe" meant concentrated on the fact that
+> > a "safe" iterator has guaranteed termination.
+>
+> I presume you mean that they should terminate for the predefined containers.
+
+Correct, I was only talking about the predefined containers.
+
+> And allow me to create iterators that don't terminate, right?
+>  I mean, infinite data structures can be useful.
+
+There is nothing that the language could do to guarantee termination.
+Personally, I find the use of the keyword "for" with something that does not
+terminate abhorrent (see below), but this is clearly no different than defining
+"*" to do a divide. So you would be able to do what you want.
+
+One of the great features of Ada is that you can't modify the for loop
+parameter. Thus, when you see the keyword "for", you know automatically that the
+loop terminates without any further analysis. Indeed, I was taught that the only
+practical way to prove loop termination is to show that the loop is equivalent
+to a for loop -- and that is a technique that I use (informally) periodically.
+(Since 30 years have gone by since I learned that, perhaps some better technique
+has been invented.) That technique would not be very useful if for loops
+themselves were not guaranteed to terminate.
+
+While it's clear that we can't *force* iterators to be written so they always
+terminate (that would require proving the absence of bugs in the definition of
+an iterator, which is surely beyond what Ada can do - maybe SPARK could do it,
+but it would have to learn about exceptions to do so), surely we want anything
+that is language-defined to have that property. And personally, I'll consider
+any non-terminating iterators to be broken, but YMMV.
+
+****************************************************************
+
+From: Robert I. Eachus
+Date: Wednesday, March  4, 2009  9:30 PM
+
+>Indeed, I was taught that the only practical way to prove loop termination is
+>to show that the loop is equivalent to a for loop -- and that is a technique
+>that I use (informally) periodically. (Since 30 years have gone by since I
+>learned that, perhaps some better technique has been invented.) That technique
+>would not be very useful if for loops themselves were not guaranteed to
+>terminate.
+
+Off topic, but:I thought it might be of interest to more than Randy:
+
+Yes, there are better methods. The generalization of the for loop approach is to
+associate the problem space with a finite set.  If the algorithm visits another
+member of the set after a  finite step, and never revisits a set member, the
+program will always terminate.(There are a lot of cases where allowing the final
+node to be visited twice simplifies the proof, but that is a detail.)
+
+For a trivial example, say you want to find the shortest route through a graph
+containing n nodes so that each node is visited once..  The problem space is the
+n factorial permutations of n.  Any algorithm that visits members of the problem
+space only once, and either terminates there or moves to a new member within a
+finite time, always terminates. This proof works for the brute force algorithm
+that visits all permutations, and also works for much more complex (and faster)
+algorithms that eliminate entire sets of permutations at each step. (At least
+for graphs with non-negative distances.) If a solution was allowed to pass
+through some nodes more than once, I guess you could generalize to visiting each
+point in the solution set at most n times, but an alternative is to renumber the
+nodes.  Try adding a count of untraversed adjoining edges to the nodes to get a
+larger problem set.
+
+There is another way to prove termination, that became somewhat popular around
+1980 with respect to linear programming and other optimization techniques.
+Unfortunately for actual computer codes, instead of pure math, it is often very
+painful.  Associate a real (not integer or floating-point) variable with the
+loop, and find a definition such that both that the value of variable has a
+limit, and that each iteration of the loop monotonically  increases (or
+decreases) the variable.  Now if the termination condition requires only that
+the variable reach a value close to* the limit, the loop terminates.
+
+Many geometric methods to solve linear programming problems use some variation
+on this to show termination.  More practically, some actually use it!  Prior to
+starting the program, you have no clue as to what the final, correct answer is.
+But you do know from the initial conditions that you can specify a volume such
+that any volume that small contains a single potential solution. (To put in LP
+terms, when the volume is that small the set of slack variables can be easily
+read off.)  So you have two loops, one which reduces the volume known to contain
+the solution at a rate with a polynomial bound in the parameters, and a second
+short loop in the number of inequalities which reads out the solution. ;-)
+
+*Yes, I know I am being mathematically imprecise here. Any real proof will
+define both the behavior of the variable near the limit, and the distance
+between the termination condition and the actual limit in terms of some common
+epsilon.  Obviously, using floating point of some sort rather than real numbers
+is where the pain comes in.  The distance between the termination condition and
+the actual limit has to be  large relative to the numerical accuracy of the
+representation, and no two steps in the loop can  be closer together than about
+twice the epsilon of the representation. So a proof using real numbers is just a
+starting point.
+
+****************************************************************
+

Questions? Ask the ACAA Technical Agent