CVS difference for ais/ai-00323.txt

Differences between 1.3 and version 1.4
Log of other versions for file ais/ai-00323.txt

--- ais/ai-00323.txt	2006/01/10 22:17:46	1.3
+++ ais/ai-00323.txt	2006/02/21 04:21:46	1.4
@@ -2221,3 +2221,400 @@
 
 ****************************************************************
 
+From: Tucker Taft
+Sent: Saturday, January 21, 2006  7:15 PM
+
+"Out" parameters in functions considered harmful (with apologies to
+Dijkstra).
+
+It appears that all arguments opposing "out" parameters in functions
+have receded into lost ancient history.  So I thought it might be useful
+to try to resurrect some of these arguments, just for the record, so
+something can be added to the Appendix of the AI on this topic.  By the
+way, the only AI that seems to have any useful discussion of this topic
+in its appendix is actually AI-230, rather than AI-323 (the one
+officially on function "out" parameters).  Not quite sure how that
+happened.  Probably reflects the fact that various ARG members drop
+comments about this issue off and on into various random AI-discussions.
+
+So here goes...
+
+The one line "sound bite" rationale for having OUT parameters in
+functions in AI-323 is:
+
+   Ada functions can have arbitrary side effects, but are not allowed to
+   announce that in their specifications
+
+The key distinction between the side-effects of a function that occur
+from code inside the function as opposed to the side-effect of a
+function call inherent in an OUT parameter, is that the side effect
+inside the function is under control of the function implementor.  The
+function implementor uses a side-effect in a function when it is most
+appropriate to the purpose of the function.  In this sense, the
+side-effect is part of the "abstraction" represented by the function.
+For example, a traditional, parameterless random number generator has a
+side-effect of updating its internal "seed".  The semantics of the
+random number generator are that it returns a different value every time
+it is called.  The order of calling is largely irrelevant, since the
+whole point is that successive values are meant to be uncorrelated.
+
+Another example is a function that allocates memory.  There is a
+side-effect on some storage pool abstraction, but again, the exact order
+in which two calls on such a function occur is generally irrelevant,
+since one memory location is presumed to be equivalent to some other
+memory location, so long as each designates a sufficiently large area.
+
+A third common kind of function with an "internal" side effect is one
+that "memoizes" answers.  That is, if you call it with the same
+parameters as provided recently, the answer is delivered from some
+internal cache, rather than being recomputed.   Again, the exact order
+in which two successive calls are processed makes little difference in
+the overall result of the program, since the expensive computation is
+performed only once, and both calls get the same answer.
+
+Let's contrast this with functions with explicit OUT parameters. Now we
+have a case where any expression that involves a call on such a function
+where the OUT parameter happens to appear as an IN parameter elsewhere
+in the same expression is almost certain to have significant order
+dependence.  But there is nothing necessarily visible at the call site
+that such an order dependence exists.  In languages like C, a classic
+example of this is:
+
+   if (a[i] == b[i++]) { ...
+
+Here there is an evaluation order dependence because "i" appears as an
+"out" parameter of the postfix "++" operator as well as an "in"
+parameter to the indexing operation.  The natural tendency to read this
+left-to-right is misleading, since there is no requirement that the update
+implied by the postfix "++" occur after the evaluation of the address
+represented by "a[i]". At least in this case the reader can see that the
+operator being called is one of those nasty operators with an OUT
+parameter, so you can hopefully be warned off your natural left-to-right
+evaluation assumptions.
+
+In Ada 201Z, let's presume a programmer writes a post-increment function
+using the new cool function OUT parameter capability:
+
+    if A(I) = B(Post_Inc(I)) then ...
+
+Gee, that looks pretty nice, and oh so economical.  Unfortunately, it
+now has a nasty evaluation order dependence.
+
+One of the great features of function calls as opposed to procedure
+calls is that they can be used in the middle of expressions.  But Ada
+makes no attempt to legislate the order of evaluation of expressions.
+You can contrast this with Java, where a left-to-right order is
+specified.  Personally, I prefer Ada's choice.  Authorizing the
+programmer to rely on a strict left-to-right order of evaluation allows
+the programmer to play tricks that can make things harder for the reader
+to understand.  Interestingly, Java (unlike C++) does not allow
+user-defined functions with OUT parameters (in C++, they are called
+"ref" parameters, but they are essentially equivalent), though it is C++
+that allows arbitrary order of evaluation of expressions. (Java does
+still have postfix "++" style operators, so a left-to-right evaluation
+order presumably helps there.)
+
+The Ada 95 "solution" to the OUT parameter issue was "access" parameters.
+These are admittedly a bit heavy weight, because they require "aliased"
+on the variable, and "'Access" at the call point.  But perhaps that is
+not so bad.  By marking something as "aliased," we can see that it
+might be undergoing updates through multiple paths, and hence evaluation
+order dependencies are highly possible.  Similarly, by using 'Access
+at the call point, we again make the potential order dependence visible
+at the place where they are most likely to occur.  Hence, our Ada 2005
+programmer who wants a Post_Inc function will have to write:
+
+    I : aliased Index_Type;
+
+    ...
+
+
+    if A(I) = B(Post_Inc(I'Access)) then ...
+
+Now hopefully some kind of alarm bell will go off.  At least someone
+reviewing this code could do a mechanical search for "'Access" and/or
+"aliased" to narrow down the places to look for these kinds of order
+dependences.
+
+One argument in favor of OUT parameters over this sort of "access"
+parameter approach seems to be one of writer convenience over reader
+visibility.  There obviously is a tradeoff here, but clearly Ada's
+very heavyweight Unchecked_Conversion approach is an example where
+the choice has been made for reader visibility over writer convenience.
+
+There are perhaps other non-convenience arguments in favor of OUT
+parameters over "access" parameters, but I'll leave those for others
+to provide or resurrect.
+
+****************************************************************
+
+From: Pascal Leroy
+Sent: Monday, January 23, 2006  2:32 AM
+
+Thanks for a very useful write-up.  It should definitely be appended to AI
+323.
+
+I believe that you lie, though, when you write:
+
+> Hence, our Ada 2005 programmer who wants a Post_Inc function will have
+> to write:
+>
+>    I : aliased Index_Type;
+>
+>    if A(I) = B(Post_Inc(I'Access)) then ...
+>
+> Now hopefully some kind of alarm bell will go off.
+
+First, note that in Ada 95 objects allocated in the heap are implicitly
+aliased, so if you use an access type you don't have to write 'Access, and
+the alarm bell is much harder to hear:
+
+	I : Access_Integer;
+
+	if A (I.all) = B (Post_Inc (I).all) then
+
+Arguably the .all should draw your attention, but things become worse if
+both A and B take access parameters:
+
+	if A (I) = B (Post_Inc (I)) then
+
+So it's not like someone looking only at the call site will have a clear
+idea that dependences on the order of evaluation exist.
+
+Second, note that in Ada 2005 this Taft person invented the prefixed view,
+which gives you an *implicit* 'Access.  So if, instead of looking at a
+plain integer, you look at an object of a tagged type, you can write:
+
+	type T is tagged ...;
+	function A (Y : T) return ...;
+	function B (Y : T) return ...;
+	function Post_Inc (Z : access T) return T;
+
+	procedure P (X : T) is
+	begin
+	   if A (X) = B (X.Post_Inc) then
+
+No alarm bells at all here!  No Access, no aliased.  In fact following
+your argument the prefixed view should be consider harmful, because now
+anything that looks like a component selection can do arbitrary side
+effects:
+
+	if X.Foo = X.Bar then -- possible dependence on evaluation order!
+
+My point is that, despite what you seem to believe, we are already in a
+deep pit and we might as well bite the bullet and recognize that arbitrary
+side effects could happen without giving a clue to the reader.  In fact,
+if we added some form of pure function we would be helping the reader in
+many cases, because they could look at the specification of the function
+without having to do extensive analysis of the call sites.
+
+****************************************************************
+
+From: Tucker Taft
+Sent: Monday, January 23, 2006  7:32 AM
+
+There are always two sides to the "slippery slope" argument.
+One is that we might as well slide all the way to the
+bottom, the other is that we should dig in our ice axes
+and preserve what's left.  OUT parameters will make this
+all very easy and common, significantly increasing the
+likelihood of unanticipated evaluation order dependences.
+Access types, tagged types, and aliased objects already
+provide plenty of ways to get side-effects when you need
+them.  No need to make it any easier to create evaluation
+order dependencies.
+
+****************************************************************
+
+From: Randy Brukardt
+Sent: Monday, January 23, 2006  9:54 PM
+
+I think most people would agree that evaluation order dependencies are bad.
+
+But the problem here is that Ada has selected the worst possible solution: it
+allows them, does not detect them in any way, and then says that if you depend
+on one (even if by accident), your code will fail at some point in the future
+when you compiler changes. Yuck.
+
+It would be better to realize that we cannot ban order dependencies, so we
+should at least make the code well-defined (which means mandating left-to-right
+evaluation). And we should give the programmer tools to avoid them (real pure
+functions).
+
+The "unspecified" order in Ada doesn't really make the code better the majority
+of the time, and it certainly makes programming in Ada more dangerous than it
+has to be. I'd figure that the odds that I have order dependencies in my code
+is probably nearly 100% (pretty much like erroneous code).
+
+Effectively, you're using two bugs in the language (unspecified order of
+evaluation and no way to ensure that functions don't have side-effects) to
+justify a third bug (out parameters for function). Three wrongs don't make a
+right, or even less wrong!!!
+
+****************************************************************
+
+From: John Barnes
+Sent: Tuesday, January 24, 2006  2:22 AM
+
+> The "unspecified" order in Ada doesn't really make the code better the
+> majority of the time, and it certainly makes programming in Ada more
+> dangerous than it has to be. I'd figure that the odds that I have order
+> dependencies in my code is probably nearly 100% (pretty much like
+> erroneous code).
+
+The one thing I regretted about RTL/2 which I devised some 35 years ago was
+that I did not specify order of evaluations. This was to help with
+optimization. It was I believe the only serious risk of non-portability.
+
+> Effectively, you're using two bugs in the language (unspecified order of
+> evaluation and no way to ensure that functions don't have side-effects) to
+> justify a third bug (out parameters for function). Three wrongs don't make
+> a right, or even less wrong!!!
+
+Bob Duff just showed me a trick attributed to JPR for doing things like
+random number generators. I am shocked that the absence of in out parameters
+has driven people to do that sort of thing.
+
+****************************************************************
+
+From: Tucker Taft
+Sent: Tuesday, January 24, 2006  7:03 AM
+
+We seem to have drifted a bit.  As far
+as unspecified order of evaluation, I am
+more on Pascal's side here.  I was inspired
+by Dijkstra's "Discipline of Programming"
+I suspect.  His mini-language uses
+non-determinacy heavily, where every "if"
+and "while" statement non-deterministically
+chooses among alternatives that are allowed
+to overlap.  E.g., for "max":
+
+     if
+       X >= Y  =>  return X;
+       Y >= X  =>  return Y;
+     fi
+
+Dijkstra's claim is that removing determinancy
+rules actually make program proofs simpler.
+
+Furthermore if any programmer takes advantage
+of determinancy rules about order of
+evaluation, etc., is creating a program that
+may be harder to understand in general.
+
+I guess my preference would be the SPARK approach,
+where order of evaluation is unspecified but
+the language rules try to statically eliminate
+cases where order of evaluation matters.
+
+Pascal's suggestion of disallowing a given
+object to be used more than once in a single
+expression if any of the uses are [IN] OUT
+parameters is an example.  Unfortunately,
+such rules can be complex, and may be more
+appropriate for a "full" static analyzer
+(like our "Inspector" product, or the SPARK
+Examiner).
+
+****************************************************************
+
+From: Pascal Leroy
+Sent: Tuesday, January 24, 2006  7:35 AM
+
+>      if
+>        X >= Y  =>  return X;
+>        Y >= X  =>  return Y;
+>      fi
+>
+> Dijkstra's claim is that removing determinancy
+> rules actually make program proofs simpler.
+
+I am not familiar with Dijkstra's language, but I have used Prolog quite a
+bit, and that's a fully nondeterministic language.  As in your example
+above, you would just describe all the possible alternatives, and it would
+pick one at execution.  The beautiful thing was that you didn't have to
+break the symmetry of the problem at hand like you do constantly in
+procedural languages (e.g. with the if statement).  The drawback of course
+was that any real-life algorithm ended up exponential, but that's a
+detail.
+
+> Furthermore if any programmer takes advantage
+> of determinancy rules about order of
+> evaluation, etc., is creating a program that
+> may be harder to understand in general.
+
+Absolutely.
+
+> Pascal's suggestion of disallowing a given
+> object to be used more than once in a single
+> expression if any of the uses are [IN] OUT
+> parameters is an example.  Unfortunately,
+> such rules can be complex...
+
+True, but it all depends on how far you want to go.  Remember, if you make
+the rules too simple, the worst that can happen is that the user has to
+introduce a temporary.  My gut feeling is that you would want to use type
+information, aliasedness, and probably not much more (in particular, no
+data flow analysis).  We are probably talking a complexity similar to that
+of the accessibility rules.  No big deal ;-)
+
+****************************************************************
+
+From: Randy Brukardt
+Sent: Tuesday, January 24, 2006  1:48 PM
+
+> > Pascal's suggestion of disallowing a given
+> > object to be used more than once in a single
+> > expression if any of the uses are [IN] OUT
+> > parameters is an example.  Unfortunately,
+> > such rules can be complex...
+>
+> True, but it all depends on how far you want to go.  Remember, if you make
+> the rules too simple, the worst that can happen is that the user has to
+> introduce a temporary.  My gut feeling is that you would want to use type
+> information, aliasedness, and probably not much more (in particular, no
+> data flow analysis).  We are probably talking a complexity similar to that
+> of the accessibility rules.  No big deal ;-)
+
+I think this would be a reasonable compromise, presuming the rules could be
+worked out. It would seem that such a check would only need to apply to
+parameters that aren't by-reference (although one could make an argument to
+include them). But it would be good to use the same rules for procedure calls,
+and there you wouldn't want to ban by-reference parameters that are the same.
+(Note that you may not be able to have a temporary of a by-reference type.)
+
+In any case, my point is that if you are worried about side-effects in
+expressions, let's attack *that* problem, not eliminate the possibility of
+caching results in objects, writing sane random number generators, writing
+interfacing code that maps the meaning closely, and the like.
+
+'In Out' parameters in functions are *not* a problem. Order-dependent
+expressions are a problem! Banning the former does little to help the latter,
+and it makes a lot of things harder than they need to be.
+
+****************************************************************
+
+From: Pascal Leroy
+Sent: Wednesday, January 25, 2006  3:17 AM
+
+> (Note that you may
+> not be able to have a temporary of a by-reference type.)
+
+True, and the same holds for limited types.  But then you could probably
+use renamings.  Remember, the only thing you have to do is to break
+complex expressions into individual declarations or statements in order to
+spell out the evaluation order.
+
+It would be an interesting exercise to try to spell out the rules and see
+where this leads us.  But not today...
+
+> 'In Out' parameters in functions are *not* a problem.
+> Order-dependent expressions are a problem! Banning the former
+> does little to help the latter, and it makes a lot of things
+> harder than they need to be.
+
+I couldn't agree more.
+
+****************************************************************
+

Questions? Ask the ACAA Technical Agent