!standard 6.02 (11) 09-06-07 AI05-0144-2/01 !class Amendment 09-06-07 !status work item 09-06-07 !status received 09-06-07 !priority High !difficulty Hard !subject Detecting dangerous order dependencies !summary Define rules to reject expressions with obvious cases of order-dependence. !problem Ada does not define the order of evaluation of parameters in calls. This opens up the possibity that function calls with side-effects can be non-portable. The problem gets worse with "in out" and access parameters, as proper evaluation may depend on a particular order of evaluation. Arguably, Ada has selected the worst possible solution to evaluation order dependencies: it allows such dependencies (by not specifying an order of evaluation), does not detect them in any way, and then says that if you depend on one (even if by accident), your code will fail at some point in the future when your compiler changes. Something should be done about this. !proposal (See wording.) !wording [I don't know where best to put this, perhaps in 6.4? - RLB] [We talk about access paths in 6.2, parameter modes, so that would seem to be a good place to discuss this related issue. -- STT] Add after 6.2(11): Two names or prefixes, N1 and N2, are *known to denote the same object* if: * N1 statically denotes a part of a stand-alone object or parameter, and N2 statically denotes the same part of the same stand-alone object or parameter; or [We're assuming that this bullet covers selected_components, as those are always known at compile-time - ED] * N1 is a dereference (implicit or explicit) of P1, N2 is a dereference (implicit or explicit) of P2, and prefixes P1 and P2 are known to denote the same object; or * N1 is an indexed_component P1(I1,...), N2 is an indexed_component P2(I2,...), the prefix P1 is known to denote the same object as the prefix P2, and for each index of the indexed_component, I1 and I2 are static expressions with the same value, or I1 and I2 are names that are known to denote the same object; or * N1 is a slice P1(S1), N2 is a slice P2(S2), the prefixes P1 and P2 are known to denote the same object, and the subtypes denoted by S1 and S2 statically match. AARM Discussion: This is determined statically. If the name contains some dynamic portion other than a dereference, indexed_component, or slice, it is not "known to denote the same object". [We could also use the same rules for indexes for the bounds of slices that have explicit bounds, although it doesn't seem very likely to occur and the wording is messy.] Two names N1 and N2 are *known to refer to the same object* if N1 and N2 are known to denote the same object, or if N1 is known to denote a subcomponent of the object denoted by N2, or vice-versa. AARM Reason: This ensures that names Prefix.Comp and Prefix are known to refer to the same object for the purposes of the rules below. This intentionally does not include dereferences; we only want to worry about accesses to the same object, and a dereference changes the object in question. (There is nothing shared between an access value and the object it designates.) A type is *known to be passed by reference* if it is tagged or immutably limited (see 7.5). AARM Reason: The by-reference property breaks privacy by requiring information about the full definition of partial views; these properties do not depend on the full definition of partial views. If a call C has two or more parameters of mode in out or out that are of a type that is not known to be passed by reference, then the call is legal only if: * For each name N of an that is passed as a parameter of mode in out or out to the call C, there is no other name among the other parameters of mode in out or out to C that is known to refer to same object. [Editor's note: see the discussion item about compatibility. Also note that I changed "denote the same object" to "refer to the same object", because this now includes composite types and thus we need the more complex matching included in "refer".] If a construct C has two or more direct constituents that are names or expressions whose evaluation may occur in an arbitrary order, at least one of which contains a function call with an in out or out parameter, then the construct is legal only if: * For each name N that is passed as a parameter of mode in out or out to some inner function call C2 (not including the construct C itself), there is no other name anywhere within a direct constituent of the construct C other than the one containing C2, that is known to refer to the same object. For the purposes of checking this rule: * For an array aggregate, an expression associated with a discrete_choice_list that has two or more discrete choices, or that has a nonstatic range, is considered as two or more separate occurrences of the expression; * For a record aggregate: - The expression of a record_component_association is considered to occur once for each associated component; and - The default_expression for each record_component_association with <> for which the associated component has a default_expression is considered part of the aggregate; * For a call, any default_expression evaluated as part of the call is considered part of the call. AARM Ramification: We do not check expressions that are evaluated only because of a component initialized by default in an aggregate (via <>). [Editor's note: I'm a bit dubious about these default_expression rules. These expressions are not visible to the programmer and may not actually modifiable by them. OTOH, it isn't very likely that they would cause a problem. These additional rules were suggested by the minutes of the Brest meeting, so they were added here.] AARM Reason: These rules prevent obvious cases of dependence on the order of evaluation of names or expressions. Such dependence is usually a bug, and in any case, is not portable to another implementation (or even another optimization setting). In the case that the top-level construct C is a call, these rules do not require checks for most in out parameters, as the rules about evaluation of calls prevent problems. Similarly, we do not need checks for short circuit operations or other operations with a defined order of evaluation. The rules about arbitrary order (see 1.1.4) allow evaluating parameters and writing parameters back in an arbitrary order, but not interleaving of evaluating parameters of one call with writing parameters back from another - that would not correspond to any allowed sequential order. End AARM Reason. AARM Ramification: Note that the latter requirement cannot fail for a procedure or entry call alone; there must be at least one function with an in out or out parameter called as part of a parameter expression of the call in order for it to fail. !discussion In order to discuss this topic, we need to look at some examples. type A_Rec is {tagged} record C : Natural := 0; end record; -- Note: We're going to talk about this record both as if it is -- tagged, and then as if it is not tagged and passed by copy. -- Thus the {tagged} here. procedure P1 (Input : in A_Rec; Output : in out A_Rec) is begin Output.C := Output.C + Input.C; end P1; procedure P2 (Bump1, Bump2 : in out A_Rec) is begin Bump1.C := Bump1.C + 1; Bump2.C := Bump2.C + 2; end P2; function F1 (Bumpee : access A_Rec) return A_Rec is begin Bumpee.C := Bumpee.C + 1; return (C => Bumpee.C - 1); end F1; function F2 (Bumpee : in out A_Rec) return A_Rec is begin Bumpee.C := Bumpee.C + 1; return (C => Bumpee.C - 1); end F2; function F3 (Obj : in A_Rec) return Natural is begin return Obj.C; end F3; function F4 (Obj : access A_Rec) return Natural is begin return Obj.C; end F4; function F5 (Bump1, Bump2 : access A_Rec) return A_Rec is begin Bump1.C := Bump1.C + 1; Bump2.C := Bump2.C + 2; return (C => Bump1.C - 1); end F5; function F6 (Bump1, Bump2 : in out A_Rec) return A_Rec is begin Bump1.C := Bump1.C + 1; Bump2.C := Bump2.C + 2; return (C => Bump1.C - 1); end F6; function F7 (Bumpee : in out A_Rec; Bumper : in A_Rec) return A_Rec is begin Bumpee.C := Bumpee.C + Bumper.C; return (C => Bumpee.C); end F6; O1 : {aliased} A_Rec := (C => 1); -- "aliased" only if needed by -- the particular example below. O2, O3 : A_Rec := (C => 1); type Acc_A_Rec is access all A_Rec; A1 : Acc_A_Rec := O1'access; The usual concern about the use of *in out* parameters in functions begins something like: Imagine writing an expression like: if F3(O1) = F3(F2(O1)) then This expression has an evaluation order dependency: if the expression is evaluated left-to-right, the result is True (both values have (C => 1) and O1.C is set to 2 afterwards), and if the expression is evaluated right-to-left, the result is False (the right operand is still (C => 1), but now the left operand is (C => 2), and O1.C is still 2 afterwards). This is usually used as a reason to not allow *in out* parameters on functions. If you have to use access parameters, then the expression is: if F3(O1) = F3(F1(O1'access)) then and the use of 'access and aliased on the declaration of O1 should provide a red flag about the possible order dependence. However, this red flag only occurs some of the time. First of all, access objects are implicitly converted to anonymous access types, so no red flag is raised when using them: if F3(A1.all) = F3(F1(A1)) then Perhaps the .all on the left-hand argument could be considered a red flag. But of course that doesn't apply if that function also takes an access parameter: if F4(A1) = F3(F1(A1)) then We have the same order dependency, but there is no sign of a red flag here. This is all Ada 95 code, but Ada 2005 makes this situation worse by adding prefix views and implicit 'access. If A_Rec is tagged, we can write: if O1.F3 = O1.F1.F3 then And since tagged parameters are implicitly aliased, if O1 is a tagged parameter, there isn't the slightest sign of a red flag here. This shows that we are already in a very deep pit. One can argue whether moving the rest of the way to the bottom is that significant. [Thanks to Pascal Leroy for showing just how bad the current situation is.] We can show similar problems with procedure calls. Consider: P1 (Input => F2(O1), Output => O1); If O1 is tagged (and thus passed by reference), this will set O1.C to 3 in either order of evaluation. (In both cases, Input will have (C => 1) and output will have (C => 2)). But if O1 is not tagged (and thus passed by copy), this will set O1.C to 3 if evaluated left-to-right [F2 sets O1.C to 2, then passes (C => 1) to Input; Output is passed (C => 2); and then that summed to (C => 3)] and O1.C will be 2 if evaluated right-to-left [Output is passed (C => 1); F2 sets O1.C to 2, then passes (C => 1) to Input; and then that is summed to (C => 2)]. We can write similar code in Ada 95: P1 (Input => F1(A1), Output => A1.all); getting the same order dependence. We can also get an order dependence from a single call (even in Ada 83): P2 (O1, O1); but only if there are multiple parameters that can modify an object, and interestingly, only if the parameters are passed by-copy. (If A_Rec is tagged, for example, the value of O1.C will be increased by 3, no matter what order the parameters are evaluated in.) That means that the Ada 95 call: P1 (Input => F5 (A1, A1), Output => O1); cannot have an order dependence, but *in out* parameters: P1 (Input => F6 (O1, O1), Output => O2); could, but only if A_Rec is passed by copy. Note that a single call with only one modifiable parameter cannot have an order dependence: P1 (Input => O1, Output => O1); will always end up with the same result, not matter what order the parameters are evaluated in. (That result could depend on the parameter passing mode, but that is controllable by using parameters of types that are by-copy or by-reference and in any case is not the problem we are discussing here). The parameters will be evaluated before the call; for by-copy they'll both have the value (C => 1) and the result value of (C => 2) will be written after the call. No part of the expression will use the modified value after it is modified, so there cannot be a dependence. Similarly, P1 (Input => F7 (O1, O1), Output => O2); does not have an order dependence, as again, no part of the expression could depend on the modified value of O1. ---- After much analysis (mostly outlined in AI05-0144-1), we settled on the following principles: (1) This problem is more insidious than "ordinary" side effects in functions. A function a routine that uses a side-effect internally ought to have been written so that the side-effect doesn't damage the correctness of the function. (Otherwise, the function could only be called once, which would be unusual.) On the other hand, a side effect occurring in a call is could not be known to the author of the function and may not be known to the author of the call either. This is code that is clearly non-portable, and is likely to break on a different compiler (or different compiler version, or even different optimization settings). That makes it more dangerous than the usual cases we've been living with for years. (2) We must avoid having these checks annoy programmers by rejecting perfectly safe things. Therefore, we will generate errors only when there is a certainty that the result depends on the order of evaluation (when it is arbitrary) and/or the parameter passing mechanism selected (pass by copy/pass by reference). In particular, that means we will not reject programs just because array indexes might be calculated to be the same value. (3) The issue occurs anywhere the language defines an arbitrary order of evaluation (which is most places). The problem could occur just as easily in an aggregate or assignment statement as in a nest of calls. (4) The language does make it very clear that all of the parameters are evaluated (in an arbitrary order), the call is made, and then the parameters are copied back (in an arbitrary order). Mixing parameter evaluations and copies back is not allowed, and that reduces the scope of the problem somewhat. (5) It probably doesn't pay to try to check access values, as they are rarely analyzable, they're effectively reference parameters (which usually are well-defined) and users expect them to be aliased. In addition, checking access parameters would be incompatible. --- Note that there is no generic contract issues with these rules. This is a case where more is allowed (strictly) in an instance when more information (such as about the kinds of types) is available. In the generic, all of these cases would be illegal for generic formal types; the only time things would be legal if if the instance has the "right" actuals (but that's irrelevant since the generic is already illegal). --- The rule about multiple in out parameters in a single call is incompatible, but virtually all programs that would be made illegal would be very dubious. For instance: procedure Do_It (Double, Triple : in out Natural) is begin Double := Double * 2; Triple := Triple * 3; end Do_It; Var : Natural := 2; Do_It (Var, Var); -- Illegal by new rules. Since whether Var contains 4 or 6 after the call to Do_It depends on the compiler version, optimization settings, and potentially the phase of the moon, depending on code like this is just a ticking time bomb. So this check will mostly detect bugs. [Editor's note: The expansion of this rule to everything that is not required to be passed by-reference will also expand the incompatibility to some cases where there is no actual problem - such as large untagged record types, which probably are passed by reference by all compilers but are not required to be passed that way. Thus the rule we have adopted seems to violate the principle of not rejecting safe things. Admittedly, Erhard does not share my feeling that by-reference parameters is safe (even though the language semantics is well-defined and all such uses are portable). I fear that Erhard's insistence on expanding this applicability will eventually cause the entire rule to be dropped -- which would be a massive pity.] --- The decision to exclude anonymous access parameters from this cheecking means that most of the initial examples in fact are still legal (even if insidious). For instance, the Ada 95 example from above: if F4(A1) = F3(F1(A1)) then is not detected by the proposed rules. This is mainly for compatibility reasons: since Ada 95 code could contain these sorts of problems, we don't want to make a lot of it illegal (even if it is dangerous). The main argument used is that functions with access parameters are common (as that was the workaround to not having "in out" parameters). It is annoying that the existence of that workaround is being used to make it harder to convert to "in out" parameters in those functions (as the result might be illegal while the original code was not -- even though both are equally dubious), but that cannot be helped. [Editor's note: I'm still dubious about this decision, especially as we seem willing to take the incompatibility for the multiple parameter case.] !example (See discussion.) !ACATS test !appendix From: Tucker Taft Sent: Sunday, June 7, 2009 7:14 AM I thought about this some more, and tried again. You had generalized your checks to cover assignment statements, but there are a lot of constructs that allow arbitrary order of evaluation (aggregates, constraints, record type elaboration, etc.). I concluded we should focus on the "arbitrary order of evaluation" rather than on calls. I also separated out the handling of elementary in-out and out parameters, as that seems like a pretty different problem (order of copy-back), and it doesn't help to mix it in with the arbitrary order of evaluation stuff. I also limited it to cases of two or more in-out or out parameters to the same call. I don't think inner calls make any difference here. (Your original wording was confusing to me, so you might have not meant to worry about inner calls either.) If we have aliased parameters, then we only care about non-explicitly-aliased elementary in-out/out parameters, but I left out that subtlety for now. I didn't bother talking about the effective "out" parameter mode of the left-hand side of an assignment, as that seems irrelevant to the rules. The problems come with the arbitrary order of evaluation of the LHS vs. the RHS, not with the actual assignment operation itself, which happens after all of the LHS and RHS evaluation is done. Finally, I didn't try to worry about general access-type parameters, but only access parameters with 'Access or 'Unchecked_Access. In any case once you get into access types, you run into all kinds of ways things can go wrong, so I would rather not venture into that can of worms too far. I think we might want to avoid any discussion of parameters of an access type, and just have the one bullet which deals with evaluation order. [This was version /01 of the AI. This was the version discussed at the Brest ARG meeting. This was made as a separate alternative so the original lengthy discussion could be preserved without rewriting it.] ****************************************************************