CVS difference for ai05s/ai05-0144-1.txt

Differences between 1.1 and version 1.2
Log of other versions for file ai05s/ai05-0144-1.txt

--- ai05s/ai05-0144-1.txt	2009/02/15 07:52:06	1.1
+++ ai05s/ai05-0144-1.txt	2009/02/18 05:43:15	1.2
@@ -1,4 +1,4 @@
-!standard 6.04 (09)                                 09-02-15  AI05-0144-1/00
+!standard 6.04 (09)                                 09-02-17  AI05-0144-1/01
 !class Amendment 09-02-15
 !status work item 09-02-15
 !status received 09-02-15
@@ -34,43 +34,300 @@
-[Editor's note: There hopefully will be a fuller writeup of this AI in a few days.]
+In order to discuss this topic, we need to look at some examples.
-Show some examples of what we're talking about.
+    type A_Rec is {tagged} record
+          C : Natural := 0;
+       end record;
+    -- Note: We're going to talk about this record both as if it is
+    -- tagged, and then as if it is not tagged and passed by copy.
+    -- Thus the {tagged} here.
+    procedure P1 (Input : in A_Rec; Output : in out A_Rec) is
+    begin
+        Output.C := Output.C + Input.C;
+    end P1;
-** Survey of problems and possible solutions **
+    function F1 (Bumpee : access A_Rec) return A_Rec is
+    begin
+        Bumpee.C := Bumpee.C + 1;
+        return (C => Bumpee.C - 1);
+    end F1;
+    function F2 (Bumpee : in out A_Rec) return A_Rec is
+    begin
+        Bumpee.C := Bumpee.C + 1;
+        return (C => Bumpee.C - 1);
+    end F2;
-Possible solutions:
+    function F3 (Obj : in A_Rec) return Natural is
+    begin
+        return Obj.C;
+    end F3;
-Specifying the order of evaluation: 
-Could require left-to-right evaluation
-(like Java) - but that encourages tricky code by making it portable.
+    function F4 (Obj : access A_Rec) return Natural is
+    begin
+        return Obj.C;
+    end F4;
-Punt: This is what Ada currently does. But this is used as an excuse for not
-allowing "in out" parameters for functions (causing all kinds of headstands).
-An option would be to add "in out" only for tagged types (no semantic
-change for that from anonymous access) -- but that's pretty warty. 
+    O1 : {aliased} A_Rec := (C => 1); -- "aliased" only if needed by
+                                      -- the particular example below.
-Increasing visibility of parameters with side-effects.
+    type Acc_A_Rec is access all A_Rec;
+    A1 : Acc_A_Rec := O1'access;
-Options for detecting some or all of the problem:
+The usual concern about the use of *in out* parameters in functions begins something
-Note that making the rules too simple only means that temporaries need to
-be introduced in expressions. That would be an annoyance, not the current
-ticking time-bomb.
+Imagine writing an expression like:
-Blame it all on functions with "in out" parameters - make them stand alone.
+    if F3(O1) = F3(F2(O1)) then
-Rules based on types alone (or almost alone).
+This expression has an evaluation order dependency: if the expression is evaluated
+left-to-right, the result is True (both values have (C => 1) and O1.C is set to 2
+afterwards), and if the expression is evaluated right-to-left, the result is False
+(the right operand is still (C => 1), but now the left operand is (C => 2), and O1.C
+is still 2 afterwards).
-Rules based on names and types.
+This is usually used as a reason to not allow *in out* parameters on functions.
+If you have to use access parameters, then the expression is:
-Compatibility? Should the rules apply to all calls or just functions?
+    if F3(O1) = F3(F1(O1'access)) then
-By-copy vs. by-reference. Should we check both or only the first?
+and the use of 'access and aliased on the declaration of O1 should provide a red flag
+about the possible order dependence.
+However, this red flag only occurs some of the time. First of all, access objects
+are implicitly converted to anonymous access types, so no red flag is raised when
+using them:
+    if F3(A1.all) = F3(F1(A1)) then
+Perhaps the .all on the left-hand argument could be considered a red flag. But of
+course that doesn't apply if that function also takes an access parameter:
+    if F4(A1) = F3(F1(A1)) then
+We have the same order dependency, but there is no sign of a red flag here.
+This is all Ada 95 code, but Ada 2005 makes this situation worse by adding prefix
+views and implicit 'access. If A_Rec is tagged, we can write:
+    if O1.F3 = O1.F1.F3 then
+And since tagged parameters are implicitly aliased, if O1 is a tagged parameter,
+there isn't the slightest sign of a red flag here.
+This shows that we are already in a very deep pit. One can argue whether moving
+the rest of the way to the bottom is that significant. [Thanks to Pascal Leroy for
+showing just how bad the current situation is.]
+We can show similar problems with procedure calls. Consider:
+    P1 (Input => F2(O1), Output => O1);
+If O1 is tagged (and thus passed by reference), this will set O1.C to 3 in
+either order of evaluation. (In both cases, Input will have (C => 1) and
+output will have (C => 2)).
+But if O1 is not tagged (and thus passed by copy), this will set O1.C to 3
+if evaluated left-to-right [F2 sets O1.C to 2, then passes (C => 1) to Input;
+Output is passed (C => 2); and then that summed to (C => 3)] and O1.C will
+be 2 if evaluated right-to-left [Output is passed (C => 1); F2 sets O1.C to 2,
+then passes (C => 1) to Input; and then that is summed to (C => 2)].
+We can write similar code in Ada 95:
+    P1 (Input => F1(A1), Output => A1.all);
+getting the same order dependence.
+More generally, an expression can only have an order dependency if there is
+something that is passed by-copy (or otherwise creating a new object as part
+of the expression). Recall that all function results create a new object,
+even for tagged types. This fact suggests ways to limit rules (discussed later).
+[Editor's note: Is this really true? I've convinced myself that it is, because
+in any such case, any updates are reflected immediately on the object - and the
+order of modifications within the subprogram body is determined by the code.
+Or are reorderings permitted? I would think not, presuming that the ultimate
+result is printed out or otherwise considered an "external interaction". Note
+that it is *not* a bounded error to modify an object by distinct access path
+if the object is a by-reference type (6.2(12)). Indeed, it might actually take
+two such newly created objects to create an order dependence, I haven't been
+able to come up with a case envolving only one newly created object -- I think
+that's because such a newly created object can't be passed as an in out
+parameter. Note that I haven't been able to find any use for this information,
+so maybe I should just dump it.]
+Questions to answer:
+Should the rules (whatever they are) apply to all calls or just functions with
+*in out* parameters? The latter position clearly is completely compatible,
+but it obviously leaves many cases of order dependence undetected.
+Another issue is whether the rules should apply to all writable parameters
+(that is all *in out* and *out* parameters, and all access-to-varaible
+parameters), or just to writable by-copy parameters (which have to be *in out*
+or *out*). Clearly, problems can happen in any of these cases. But Ada has
+lived with the by-reference cases for decades, and adding *in out* parameters
+to functions doesn't change anything here (for a by-reference parameter,
+an *in out* parameter is equivalent to an access parameter of the same type).
+As noted in the discussion above, the real problems occur when new objects
+are created by intermediate expressions. "By-copy" in this context still means
+any parameter that *might* be passed by-copy: any type that is not known to be
+a by-reference type (clearly including untagged private types). It also has
+to include return objects of all types.
+Note that these two issues are not completely independent of each other:
+limiting checks to by-copy parameters also limits the added incompatibility,
+(and the likelyhood that the rules are really preventing errors).
+Survey of solutions
+It fairly obvious that that order dependencies are a problem in Ada, and
+have been getting worse with each version of the language (even without
+*in out* parameters in functions). Moreover, it has been used as the
+primary reason for leaving something natural and useful (*in out* parameters
+for functions) out of the language.
+We could do nothing, saying that since we're already nearly at the bottom
+of the pit, moving down two inches to the bottom will not have any practical
+effect. It surely would be easier than trying to cling to the side of the
+pit! But perhaps we can do better.
+It should be noted immediately that (almost) anything we could do could not
+detect order dependencies that are caused by side-effects *inside* of
+functions. Rules enforced on a call (or syntax, or whatever) can only
+prevent problems caused by side-effects visible to the call. 
+We could limit *in out* parameters for functions to by-reference types
+(effectively to only tagged types). That clearly will not introduce any *new*
+problems, as shown by the examples that start out this section. But argubly
+it would make them more likely, and in any event, it would leave all of the
+existing nasty cases left unchecked.
+One obvious solution would be to define the order of evaluation, eliminating
+the problem at the source. Java, for instance, requires left-to-right
+evaluation. However, that would encourage tricky code like the various
+examples by making it portable. Moreover, Ada compilers have been using this
+flexibility for decades; trying to remove it from compilers (particularly
+from optimizers) could be very difficult. Note that this is the only solution
+that actually could eliminate dependencies on side-effects inside of functions.
+But defining the order of evaluation was considered for both Ada 83 and Ada 95
+and was deemed not worth it -- it's hard to see what has changed.
+Another option would be to increase the visibility of parameters with
+side-effects. This sounds somewhat appealing (after all, it seems to be
+the basis on which access parameters are deemed OK and *in out* parameters
+are not). One possibility would be to add ordering symbols to named notation:
+Param <- <expr> for an *in* parameter (this includes access); Param -> <expr>
+for an *out* parameter; and Param <-> <expr> for an *in out* parameter.
+However, for compatibility, old code that don't use the symbols would
+have to be allowed. That's especially bad because the symbol for ordinary
+named parameters (=>) looks like the symbol for an *out* parameter; while it
+usually will be an *in* parameter. Moreover, this solution does nothing for
+positional parameters in calls.
+One could imagine requiring the syntax for calls of functions with *in out*
+parameters and making it optional elsewhere. That might placate *in out*
+parameter opponents, but otherwise doesn't seem to do much for the language.
+Finally, we come to some sort of legality rules and/or runtime checks for
+preventing such order dependencies. It is important to note that making such
+rules too simple (and strong) only would mean that temporaries have to be
+introduced in some expressions. That would be an annoyance, but surely not
+as bad as the current ticking time-bomb.
+The easiest option is to blame all of the problems on functions with *in out*
+parameters and make them stand alone. The rule would be something like:
+   A call of a function with an *in out* parameter must be the only call in
+   an expression.
+That would mean that
+     if F2(O1) then
+would be legal (assuming F2 returned type Boolean), but
+     if F2(O1) = (C => 1) then
+would not be. Obviously, this is too strict. Amazingly, it also not strict
+     Some_Array(F3(O1)) := F2(O1);
+would be allowed. (An assignment statement is *not* an expression!)
+   A call of a function with an *in out* parameter must be the only call in
+   an expression or statement;
+   If a call of a function with an *in out* parameter is the source expression
+   of an assignment_statement, the target variable_name shall not include a
+   call, an indexed component, a slice, or a dereference.
+This would allow assigning to temporaries and record components (which can't be
+computed) and not much else.
+Of course, this is completely compatible with Ada 95 and later; but it doesn't
+do anything to detect the existing cases of problems. Maybe that's not important.
+In order to have an order dependence, there has to be two (or more) uses of an single
+object within an expression (or statement). But of course that object could be aliased
+via parameter passing or access values, a part of a larger object, or computed (as in
+an array component). In addition, one of the uses of the object has to be modified
+(that is, it is passed as an *in out* or *out* parameter, or it is passed as the
+designated object of an access type parameter). Finally, in the smallest call
+(subexpression) that contains both uses of the object (that is the call where the
+order dependence can occur), the use that modifies the object cannot be directly a
+parameter of that call.
+This last part is shown by a procedure call like (we're using the declarations at
+the head of this section again):
+    P1 (Input => (C => F3(O1)), Output => O1);
+Since the use that modifies O1 is in the top-level procedure call, it won't be
+modified until that call is underway. The other parameter(s) will have been evaluated
+by then.
+Turning this into rules is not hard, except for defining a "single object". The easiest
+way to do that is to simply use type information:
+Two objects are considered to be "potentially the same object" if one has the type
+of a part of the other, or one object's type is a partial view. [The last part is to
+avoid privacy issues - we don't want to be looking inside of private types to figure
+this out, and we want to asssume the worst. Remember that 'part' includes the type
+A call is legal only if:
+   For each name N that is passed to some inner call (not including the call itself)
+   as the actual parameter to a formal in out or out parameter, or is passed as
+   the designated object of a formal parameter of an access type, there is no
+   other name in the expressions of the actual parameters of the call other than
+   the one containing N that denotes potentially the same object.
+For the purpose of checking this rule, an assignment_statement is considered a call
+with two parameters.
+[Randy: I ran out of time and energy here, just as I was getting to the good part.]
+TBD: This is obviously too strict. Show some examples why. Also, reason out the
+rules for private types (they're probably not correct; they don't seem to handle
+components of a private type; also, try to figure out if the contents actually
+matter -- they may not as we're only interested in what is visible at the point
+of the call, right?). Try to relax the rules by using more information about
+when objects could be "potentially the same object", such as local variables that
+aren't aliased, pool-specific access types, etc.
+Consider Runtime checks if we don't want to make have to assume two array components
+are the same??
+End TBD.

Questions? Ask the ACAA Technical Agent