CVS difference for ai05s/ai05-0144-1.txt

Differences between 1.2 and version 1.3
Log of other versions for file ai05s/ai05-0144-1.txt

--- ai05s/ai05-0144-1.txt	2009/02/18 05:43:15	1.2
+++ ai05s/ai05-0144-1.txt	2009/04/30 05:52:16	1.3
@@ -1,4 +1,4 @@
-!standard 6.04 (09)                                 09-02-17  AI05-0144-1/01
+!standard 6.04 (09)                                 09-04-29  AI05-0144-1/02
 !class Amendment 09-02-15
 !status work item 09-02-15
 !status received 09-02-15
@@ -8,7 +8,7 @@
 
 !summary
 
-** TBD **
+Define rules to reject expressions with obvious cases of order-dependence.
 
 !problem
 
@@ -27,11 +27,77 @@
 
 !proposal
 
-** TBD **
+(See wording.)
 
 !wording
 
+[I don't know where best to put this, perhaps in 6.4? - RLB]
 
+A name N1 is *known to denote the same object* as another name N2 if:
+  * N1 statically denotes a part of a stand-alone object or parameter, and
+    N2 statically denotes the same part of the same stand-alone object or
+    parameter; or
+    [We're assuming that the first bullet covers selected_components, as
+     those are always known at compile-time - ED]
+  * N1 is a dereference P1.all, N2 is a dereference P2.all, and
+    the prefix P1 is known to denote the same object as the prefix P2; or
+  * N1 is an indexed_component P1(I1,...), N2 is an indexed_component
+    P2(I2,...), the prefix P1 is known to denote the same object as the
+    prefix P2, and for each index of the indexed_component, I1 and I2 are
+    static expressions with the same value, or I1 and I2 are names that
+    are known to denote the same object.
+
+AARM Discussion: This is determined statically. If the name contains some dynamic
+portion other than a dereference or indexed_component, it is not "known to denote
+the same object". [We could also do slices here, although it doesn't seem
+likely enough to bother with.]
+
+A name N1 is *known to denote a prefix of the same object* as another name N2 if
+N2 is known to denote the same object as a subcomponent of the object denoted by N1.
+
+AARM Reason: This ensures that names Prefix.Comp and Prefix are known to
+denote the same object for the purposes of the rules below. This intentionally does
+not include dereferences.
+
+A call C is legal only if:
+ *  For each name N that is passed to some inner call (not including the call C
+    itself) as the actual parameter to a formal in out or out parameter, there
+    is no other name anywhere in the expressions of the actual parameters of the call
+    other than the one containing N that is known to denote the same object or is
+    known to denote a prefix of the same object; and
+ *  For each name N that is passed to some inner call (not including the call C
+    itself) as the actual parameter of a formal parameter of an access type, there
+    is no other name anywhere in the expressions of the actual parameters of the call
+    other than the one containing N that is known to denote the same object as N.all;
+    and
+ *  for each name N that is passed as the actual parameter to a formal in out or out
+    parameter that is not of a by-reference type, there is no other name in the
+    actual parameters corresponding to formal in out or out parameters of the call
+    other than the one containing N that is known to denote the same object or is
+    known to denote a prefix of the same object.
+
+For the purpose of checking this rule, an assignment_statement is considered a call
+with two parameters, the parameter corresponding to the source expression having mode
+in and the parameter corresponding to the target name having mode out.
+
+AARM Reason: This prevents obvious cases of dependence on the order of
+evaluation of parameters in expressions. Such dependence is usually a bug, and
+in any case, is not portable to another implementation (or even another
+optimization setting).
+
+The second bullet does not check for uses of the prefix that are not dereferences,
+as that is the access type, and we cannot know any uses of that type actually do
+a dereference. Only dereferences of the access value expose the application to order
+dependencies.
+End AARM Reason.
+
+AARM Ramification: Note that first two bullets cannot fail for a procedure or entry
+call alone; there must be at least one function with an access, in out, or out
+parameter called as part of a parameter expression of the call in order for it
+to fail.
+
+
+
 !discussion
 
 In order to discuss this topic, we need to look at some examples.
@@ -48,6 +114,12 @@
         Output.C := Output.C + Input.C;
     end P1;
 
+    procedure P2 (Bump1, Bump2 : in out A_Rec) is
+    begin
+        Bump1.C := Bump1.C + 1;
+        Bump2.C := Bump2.C + 2;
+    end P2;
+
     function F1 (Bumpee : access A_Rec) return A_Rec is
     begin
         Bumpee.C := Bumpee.C + 1;
@@ -70,8 +142,29 @@
         return Obj.C;
     end F4;
 
+    function F5 (Bump1, Bump2 : access A_Rec) return A_Rec is
+    begin
+        Bump1.C := Bump1.C + 1;
+        Bump2.C := Bump2.C + 2;
+        return (C => Bump1.C - 1);
+    end F5;
+
+    function F6 (Bump1, Bump2 : in out A_Rec) return A_Rec is
+    begin
+        Bump1.C := Bump1.C + 1;
+        Bump2.C := Bump2.C + 2;
+        return (C => Bump1.C - 1);
+    end F6;
+
+    function F7 (Bumpee : in out A_Rec; Bumper : in A_Rec) return A_Rec is
+    begin
+        Bumpee.C := Bumpee.C + Bumper.C;
+        return (C => Bumpee.C);
+    end F6;
+
     O1 : {aliased} A_Rec := (C => 1); -- "aliased" only if needed by
                                       -- the particular example below.
+    O2, O3 : A_Rec := (C => 1);
 
     type Acc_A_Rec is access all A_Rec;
     A1 : Acc_A_Rec := O1'access;
@@ -141,26 +234,43 @@
     P1 (Input => F1(A1), Output => A1.all);
 
 getting the same order dependence.
+
+We can also get an order dependence from a single call (even in Ada 83):
 
-More generally, an expression can only have an order dependency if there is
-something that is passed by-copy (or otherwise creating a new object as part
-of the expression). Recall that all function results create a new object,
-even for tagged types. This fact suggests ways to limit rules (discussed later).
-
-[Editor's note: Is this really true? I've convinced myself that it is, because
-in any such case, any updates are reflected immediately on the object - and the
-order of modifications within the subprogram body is determined by the code.
-Or are reorderings permitted? I would think not, presuming that the ultimate
-result is printed out or otherwise considered an "external interaction". Note
-that it is *not* a bounded error to modify an object by distinct access path
-if the object is a by-reference type (6.2(12)). Indeed, it might actually take
-two such newly created objects to create an order dependence, I haven't been
-able to come up with a case envolving only one newly created object -- I think
-that's because such a newly created object can't be passed as an in out
-parameter. Note that I haven't been able to find any use for this information,
-so maybe I should just dump it.]
+    P2 (O1, O1);
 
+but only if there are multiple parameters that can modify an object, and
+interestingly, only if the parameters are passed by-copy. (If A_Rec is
+tagged, for example, the value of O1.C will be increased by 3, no matter
+what order the parameters are evaluated in.)
+
+That means that the Ada 95 call:
+    P1 (Input => F5 (A1, A1), Output => O1);
+cannot have an order dependence, but *in out* parameters:
+    P1 (Input => F6 (O1, O1), Output => O2);
+could, but only if A_Rec is passed by copy.
+
+Note that a single call with only one modifiable parameter cannot have an
+order dependence:
+
+    P1 (Input => O1, Output => O1);
+
+will always end up with the same result, not matter what order the parameters
+are evaluated in. (That result could depend on the parameter passing mode, but
+that is controllable by using parameters of types that are by-copy or by-reference
+and in any case is not the problem we are discussing here). The parameters will
+be evaluated before the call; for by-copy they'll both have the value (C => 1)
+and the result value of (C => 2) will be written after the call. No part of the
+expression will use the modified value after it is modified, so there cannot be
+a dependence.
+
+Similarly,
+   P1 (Input => F7 (O1, O1), Output => O2);
+does not have an order dependence, as again, no part of the expression could
+depend on the modified value of O1.
 
+
+
 Questions to answer:
 
 Should the rules (whatever they are) apply to all calls or just functions with
@@ -230,13 +340,32 @@
 have to be allowed. That's especially bad because the symbol for ordinary
 named parameters (=>) looks like the symbol for an *out* parameter; while it
 usually will be an *in* parameter. Moreover, this solution does nothing for
-positional parameters in calls.
+positional parameters in calls nor for the prefixes of prefix notation. And
+it is misleading for access parameters, whose mode is officially "in",
+but still might cause side-effects.
+
+One could argue that positional parameters are already unsafe and requiring
+named notation to be safe is not much of an imposition. But the prefix and
+access issues are not so easily explained away. Additionally, putting the
+mode into calls in some way makes maintenance of programs harder: changing
+the mode of a call is going to make many calls illegal, while today most calls
+will remain legal (all will if the mode is changed from "in out" to "in").
+
+Another syntax suggestion that was made recently was to (optionally) include
+the parameter mode as part of the call. That would look something like:
+
+    if F3(in O1) = F3(in F2(in out O1)) then
 
+This could be applied to positional calls as well, but still provides no
+help for prefix calls nor for access parameters.
+
 One could imagine requiring the syntax for calls of functions with *in out*
 parameters and making it optional elsewhere. That might placate *in out*
 parameter opponents, but otherwise doesn't seem to do much for the language.
 
 
+
+
 Finally, we come to some sort of legality rules and/or runtime checks for
 preventing such order dependencies. It is important to note that making such
 rules too simple (and strong) only would mean that temporaries have to be
@@ -282,12 +411,16 @@
 via parameter passing or access values, a part of a larger object, or computed (as in
 an array component). In addition, one of the uses of the object has to be modified
 (that is, it is passed as an *in out* or *out* parameter, or it is passed as the
-designated object of an access type parameter). Finally, in the smallest call
-(subexpression) that contains both uses of the object (that is the call where the
-order dependence can occur), the use that modifies the object cannot be directly a
-parameter of that call.
+designated object of an access type parameter). Finally, one of the following has
+to be true:
 
-This last part is shown by a procedure call like (we're using the declarations at
+* in the smallest call (subexpression) that contains both uses of the object (that
+  is the call where the order dependence can occur), the use that modifies the object
+  cannot be directly a parameter of that call; or
+* both uses are parameters to the same call; both uses could modify the object,
+  and neither parameter is required to be passed by reference.
+
+The first bullet is shown by a procedure call like (we're using the declarations at
 the head of this section again):
 
     P1 (Input => (C => F3(O1)), Output => O1);
@@ -295,39 +428,220 @@
 Since the use that modifies O1 is in the top-level procedure call, it won't be
 modified until that call is underway. The other parameter(s) will have been evaluated
 by then.
+
+The second bullet provides the only exception to the first bullet; it covers cases
+with two parameters, both of which can modify the object, and at least one of which
+could be passed by copy.
 
-Turning this into rules is not hard, except for defining a "single object". The easiest
-way to do that is to simply use type information:
 
+Turning this into rules is not hard, except for defining a "a single object".
+The easiest way to do that is to simply use type information:
+
 Two objects are considered to be "potentially the same object" if one has the type
-of a part of the other, or one object's type is a partial view. [The last part is to
-avoid privacy issues - we don't want to be looking inside of private types to figure
-this out, and we want to asssume the worst. Remember that 'part' includes the type
-itself.]
+of a part of the other. [Remember that 'part' includes the type itself. The only
+parts considered for this rule are those that are visible at the point of the call,
+as is typical for legality rules.]
 
 A call is legal only if:
    For each name N that is passed to some inner call (not including the call itself)
    as the actual parameter to a formal in out or out parameter, or is passed as
    the designated object of a formal parameter of an access type, there is no
    other name in the expressions of the actual parameters of the call other than
-   the one containing N that denotes potentially the same object.
+   the one containing N that denotes potentially the same object; and
+   for each name N that is passed as the actual parameter to a formal in out or out
+   parameter that is not of a by-reference type, there is no other name in the
+   actual parameters corresponding to formal in out or out parameters of the call
+   other than the one containing N that denotes potentially the same object.
 
 For the purpose of checking this rule, an assignment_statement is considered a call
-with two parameters.
+with two parameters, the source parameter having mode in.
+
+
+This is clearly too strict, as calls with obviously different objects would be illegal.
+For instance, 
+   P1 (Input => F2(O2), Output => O1);
+fails this proposed rule. That would annoy programmers intensely, as it is crystal-clear
+that there is no conflict in this call. Moreover, this could interfere with using
+temporaries to break up expressions that otherwise would violate the rules. So this
+rule needs improvement.
+
+There is a also a problem with private types. If a private type has an aliased
+component, it is possible for an access type (presumably returned from some operation
+on the private type) to designate that component. But that fact that the private
+object had such a part with the appropriate part would not be known at the point of
+the call. That could happen, for instance, in a container accessor.
+
+    A1 := Accessor(A_Rec_Container);
+    PX (A_Rec_Container, F2 (A1.all));
+
+A1.all designates part of A_Rec_Container, and (if we are trying to catch all such cases),
+the call to PX should be illegal.
+
+So we could modify the definition of "potentially the same object" to:
+
+Two objects are considered to be "potentially the same object" one has the type
+of a part of the other, or one has a part whose type is a partial view, unless:
+   * one object is part of a stand-alone object, and the other object is part
+     of a different stand-alone object;
+     [Since there are no dereferences here (they're never stand-alone objects),
+      we don't have to worry about private types because the problem cases all
+      involve dereferences designating aliased components. And different
+      objects are otherwise disjoint.]
+   * one object is part of a stand-alone object SO with no parts that are aliased or
+     have types that are partial view, and the other object is not part of SO;
+     [Here we can see the the first object has no aliased parts or private types
+      that could hide aliased parts; in that case, we only care that the second
+      object is not part of the same object.]
+   * one object is a parameter of an elementary type, and the other object
+     is not that parameter or rename thereof;
+     [We don't say "by-copy type" here, as that ignores privacy, which would be
+      wrong for legality rules. A by-copy parameter can't be aliased and
+      cannot represent any other object, so they never are the same as some
+      other object.]
+
+    [In the following cases, we can ignore private types; a type can never
+      be directly a subcomponent of itself, and O2 cannot be a private type
+      or we wouldn't be able to see the subcomponent. We also don't have
+      to worry about different stand-alone objects, as t]
+   * the type of object O1 is that of a (visible) subcomponent of object O2,
+     object O1 is a non-aliased part of a stand-alone object, 
+     and object O2 is not part of the same stand-alone object;
+     [We can only get in trouble here if O2 is part of the same stand-alone
+     object. O2 can be in a storage pool, or any other stand-alone object, as
+     we don't have to worry about aliasing.]
+   * the type of object O1 is that of a (visible) subcomponent of object O2,
+     object O2 is a aliased part of a stand-alone object, 
+     and object O1 is a part of a dereference of a pool-specific access type;
+     [Here we have to worry about accesses to the subcomponent of O2; we don't
+     want O1 to be a dereference that represents that subcomponent. Different
+     stand-alone objects are covered in the first bullet.]
+   * the type of object O1 is that of a (visible) subcomponent of object O2,
+     object O2 is a non-aliased stand-alone object, and object O1 is not part
+     of the same stand-alone object;
+     [Note that we don't talk about object O2 as being a part. That gets messy
+     when private types are taken into account, because we can't necessarily
+     see if the objects have an aliased components of the right type. (And even
+     when we can, the wording is messy.) Otherwise, this is similar to the
+     second bullet.
+   * the type of object O1 is that of a (visible) subcomponent of object O2,
+     object O2 is a aliased part of a stand-alone object, 
+     and object O1 is a part of a different stand-alone object or a part of a
+     dereference of a pool-specific access type;
+     [Here we have to worry about accesses to the subcomponent of O2; we don't
+     want O1 to be a dereference that represents that subcomponent.]
+
+Essentially the idea here is that if the two objects are different stand-alone
+objects, we don't care about their types; if one of the objects is a stand-alone
+object and nothing is hidden or aliased, then the other object can be anything
+other than the same object.
+
+If we can see the relationship between the objects, we can then allow more
+(such as pool-specific dereferences, which can't designate stand-alone objects).
+
+Private types are considered to conflict with all other types.
+
+This gets rid of all of the critical problems; private types can't hide aliasing,
+and temporaries will always work to break up expressions.
+
+However, these rules still have some annoying effects:
+ * Parameters are not stand-alone objects, so parameters are treated as if they could
+   be the same always treated as if they could be the same, if there is any
+   possibility. We do except by-copy parameters, as they can't be aliased and they
+   are effectively new objects.
+ * Changing a type from a visible type to a private type potentially could make
+   some calls illegal. The above rules assume the worst about private types, while
+   as long as the type is visible, the compiler can actually verify that there is
+   no problem. That seems inherent in this model.
+ * There is no attempt to differentiate individual components that have the same
+   type. This can appear for records:
+       P (Complex.Real, F(Complex.Img));
+   would appear to conflict and thus be illegal.
+   But this is much more likely to cause problems for arrays:
+       P (Arr(1), F(Arr(2)));
+   The reader can easily tell that these array component s aren't ever going to be the
+   same, but the compiler can't with the given rules. Of course, if 1 and 2 were
+   replaced by functions, then these should be treated as overlapping.
+
+The component cases could be handled with more complex rules which could determine
+if the components are different parts of the same object. In particular, array
+components with static index expressions can be allowed.
+
+However, the increasing complexity of these rules has led the author to quit at this
+point. Moreover, they still seem to be too incompatible; this is most acute when
+looking at a routine like Swap_Integers (with two in out parameters of type Integer):
+   Swap_Integers (Arr(I), Arr(J));
+This would be illegal by any conceivable set of "complete" rules, since we cannot know
+the values of I and J, and the parameters are not by-reference. As such, this call
+should not be allowed, but that is a likely incompatibility.
+
+We could break the "completeness" by only checking function calls, but that doesn't
+make much sense. There is little semantic difference between functions and procedures,
+and programmers switch between them all the time. Preventing a problem for function
+calls but not preventing the same problem for procedures would be bizarre.
+
+
+A better solution would be to create rules that only try to detect the "low-hanging
+fruit" - that is cases where there clearly is a problem. This would be preferable
+anyway; it would reduce the frustration caused by the rules (compared to rules like
+the accessibility rules, for which the checks generally have no effect except to
+get in the way of what you need to do -- to the point that Ada provides an attribute
+to ignore them!). It would still be possible to cause problems, but at least 
+
+Such rules would only reject calls where it is clear that parts of the same object
+are involved. That eliminates the complications of private types (we won't look in
+them), arrays (we won't try to determine if they are the same), and so on.
+
+These are the rules proposed in the !wording section above.
+
+The proposed wording allow detects arrays indexed by the same value or object,
+dereferences of the same access value, as well as uses of the same object. It
+does not try to detect other cases of problems.
+
+For instance,
+   Swap_Integer (Arr(I), Arr(I));
+is illegal (and is almost certainly a bug -- one I've written periodically!), while
+   Swap_Integer (Arr(I), Arr(J));
+is legal.
+
+
+Additional alternatives
+
+One idea that was considered but discarded was to use run-time checks to deal with
+the complex cases, such as when private types are (usually) hiding the fact that
+checks are not needed, or to verify that array indexes are different when that
+is required.
+
+The problem is that adds a lot of complexity, and while many of the checks could
+be eliminated, some probably would remain, adding overhead without much value.
+(After all, the program probably would work without the checks.)
+
+
+None of the proposed rules do anything about side-effects totally inside of
+functions. One way to deal with that would be to require an expression to contain
+only a single function call unless all of the functions are *strict*. A strict
+function exposes all of its side effects in its specification, meaning it does
+not read or write global variables, call non-strict functions, write hidden parts
+of parameters, etc. Most of the language-defined functions are strict (that would
+have to be declared somehow).
+
+Strict functions have several other nice properties: they don't need elaboration
+checking (freezing is sufficient to prevent access-before-elaboration problems);
+they can be used in symbolic reductions; and they can use the optimizations allowed
+for functions in pure packages.
+
+However, such a requirement would be quite incompatible. Moreover, strict functions
+would be rather limiting by themselves.
+
+An alternative that has been suggested is to allow Global_In and Global_Out
+annotations on subprograms, which would declare the global side-effects of a
+subprogram. Such annotations could not be a lie (they'd have to be checked in some
+way), and thus would fill the role of strict functions more flexibly. But it would
+still be too incompatible to ban dangerous side-effects in functions (although
+separate tools or non-Ada operating modes could make such checks).
+
+
 
-[Randy: I ran out of time and energy here, just as I was getting to the good part.]
 
-TBD: This is obviously too strict. Show some examples why. Also, reason out the
-rules for private types (they're probably not correct; they don't seem to handle
-components of a private type; also, try to figure out if the contents actually
-matter -- they may not as we're only interested in what is visible at the point
-of the call, right?). Try to relax the rules by using more information about
-when objects could be "potentially the same object", such as local variables that
-aren't aliased, pool-specific access types, etc.
-
-Consider Runtime checks if we don't want to make have to assume two array components
-are the same??
-End TBD.
 
 !example
 

Questions? Ask the ACAA Technical Agent