!standard 6.02 (11) 10-05-03 AI05-0144-2/06 !class Amendment 09-06-07 !status Amendment 2012 10-04-23 !status ARG Approved 5-0-4 10-02-26 !status work item 09-06-07 !status received 09-06-07 !priority High !difficulty Hard !subject Detecting dangerous order dependences !summary Define rules to reject expressions with obvious cases of order-dependence. !problem Ada does not define the order of evaluation of parameters in calls. This opens up the possibity that function calls with side-effects can be non-portable. The problem gets worse with "in out" and access parameters, as proper evaluation may depend on a particular order of evaluation. Arguably, Ada has selected the worst possible solution to evaluation order dependences: it allows such dependences (by not specifying an order of evaluation), does not detect them in any way, and then says that if you depend on one (even if by accident), your code will fail at some point in the future when your compiler changes. Something should be done about this. !proposal The following rules eliminate the most obvious side effects that can cause evaluation order problems. These rules are checked statically. Unlike most static rules, which are conservative, these rules are liberal in that they do not attempt to prevent all evaluation order problems, just ones that are certain to be problems. !wording Add after 6.2(11): Two names are *known to denote the same object* if: * both names statically denote the same stand-alone object or parameter; or * both names are selected_components, their prefixes are known to denote the same object, and their selector_names denote the same component; or * both names are dereferences (implicit or explicit), the dereferenced names are known to denote the same object, and both names have the same immediately enclosing complete context (see 8.6); or AARM Reason: We need the requirement to have the same enclosing complete context in order to avoid problems with renames. Consider: type Ref is access Some_Type; Ptr : Ref := new Some_Type'(...); X : Some_Type renames Ptr.all; begin Ptr := new Some_Type'(...); P (Func_With_Out_Params (Ptr.all, X)); X and Ptr.all should not be known to denote the same object, since they denote different allocated objects. End AARM Reason. * both names are indexed_components, their prefixes are known to denote the same object, and each of the pairs of corresponding index values are known to have the same value; or AARM Discussion: "Statically known to have the same value" is defined below. The term is necessary to avoid problems with renames similar to the dereference example above. * both names are slices, their prefixes are known to denote the same object, and the two slices have statically matching index constraints; or * one of the two names statically denotes a renaming declaration whose renamed object_name is known to denote the same object as the other name. AARM Reason: This exposes known renamings of slices, indexing, and so on to this definition. In particular, if we have C : Character renames S(1); then C and S(1) are known to denote the same object. End AARM Reason. AARM Ramification: "Known to denote the same object" is intended to be an equivalence relationship, that is, it is reflexive, symmetric, and transitive. We believe this follows for the rules. For instance, given the following declarations: S : String(1..10); ONE : constant Natural := 1; R : Character renames S(1); the names R and S(1) are known to denote the same object by the sixth bullet, and S(1) and S(ONE) are known to denote the same object by the fourth bullet, so using the sixth bullet on R and S(ONE), we simply have to test S(1) vs. S(ONE), which we already know denote the same object. END AARM Ramification. AARM Discussion: Whether or not names or prefixes are known to denote the same object is determined statically. If the name contains some dynamic portion other than a dereference, indexed_component, or slice, it is not "known to denote the same object". These rules make no attempt to handle slices of objects that are known to be the same when the slices have dynamic bounds (other than the trivial case of bounds being defined by the same subtype), even when the bounds could be proven to be the same, as it is just too complex to get right and these rules are intended to be conservative. End AARM Discussion. Given two names or expressions of the same discrete type, one is *known to have the same value* as the other if * both are static expressions and their values are the same; or * both are names and the two names are known to denote the same object and that object is a constant object; or AARM discussion: A dereference of an access-to-constant value denotes a constant view of a potentially variable object, not a constant object. * both are names and the two names are known to denote the same object and both names have the same immediately enclosing complete context (see 8.6); or * one of the two is a name which statically denotes a renaming declaration whose renamed object_name is known to have the same value as the other name or expression; or * one of the two is a name which statically denotes a non-deferred constant object whose initialization expression is known to have the same value as the other name or expression; or * one of the two is a parenthesized expression or qualified_expression whose operand is known to have the same value as the other name or expression. AARM Discussion: This is a slippery slope in that we could continue to add rules here until the cows come home [thanks to the Beatles for that phrase]. For instance, we could imagine having "Both expressions are calls to the same predefined operator, and the corresponding operands of the expressions are statically known to have the same value." But the above captures all of the most likely cases. AARM To Be Honest: "Known to have the same value" doesn't guarantee that the value is the same in some obscure cases: * An intervening function call changes the value of an object included in one of the expressions; * One of the expressions includes a Volatile variable modified outside of the program; * One of the expressions includes a variable defined with an address clause (or similar features) and an intervening subprogram call modifies its location. These aren't a problem for this use; the first case is exactly what we are trying to detect (code that is not portable because it depends on the order of evaluation), and the others are unspeakably tricky if they are actually intended behavior. However, future authors of the standard should take care if they use this term in some other context. End AARM To Be Honest. Two names are *known to refer to the same object* if The two names are known to denote the same object; or One of the names is a selected_component, indexed_component, or slice and its prefix is known to refer to the same object as the other name; or One of the two names statically denotes a renaming declaration whose renamed object_name is known to refer to the same object as the other name. AARM Reason: This ensures that names Prefix.Comp and Prefix are known to refer to the same object for the purposes of the rules below. This intentionally does not include dereferences; we only want to worry about accesses to the same object, and a dereference changes the object in question. (There is nothing shared between an access value and the object it designates.) If a call C has two or more parameters of mode in out or out that are of an elementary type, then the call is legal only if: * For each name N that is passed as a parameter of mode in out or out to the call C, there is no other name among the other parameters of mode in out or out to C that is known to denote the same object. AARM To Be Honest: This means *visibly* an elementary type; it does not include partial views of elementary types (partial views are always composite). That's necessary to avoid having Legality Rules depend on the contents of the private part. If a construct C has two or more direct constituents that are names or expressions whose evaluation may occur in an arbitrary order, at least one of which contains a function call with an in out or out parameter, then the construct is legal only if: * For each name N that is passed as a parameter of mode in out or out to some inner function call C2 (not including the construct C itself), there is no other name anywhere within a direct constituent of the construct C other than the one containing C2, that is known to refer to the same object. For the purposes of checking this rule: * For an array aggregate, an expression associated with a discrete_choice_list that has two or more discrete choices, or that has a nonstatic range, is considered as two or more separate occurrences of the expression; * For a record aggregate: - The expression of a record_component_association is considered to occur once for each associated component; and - The default_expression for each record_component_association with <> for which the associated component has a default_expression is considered part of the aggregate; * For a call, any default_expression evaluated as part of the call is considered part of the call. AARM Ramification: We do not check expressions that are evaluated only because of a component initialized by default in an aggregate (via <>). AARM Reason: These rules prevent obvious cases of dependence on the order of evaluation of names or expressions. Such dependence is usually a bug, and in any case, is not portable to another implementation (or even another optimization setting). In the case that the top-level construct C is a call, these rules do not require checks for most in out parameters, as the rules about evaluation of calls prevent problems. Similarly, we do not need checks for short circuit operations or other operations with a defined order of evaluation. The rules about arbitrary order (see 1.1.4) allow evaluating parameters and writing parameters back in an arbitrary order, but not interleaving of evaluating parameters of one call with writing parameters back from another - that would not correspond to any allowed sequential order. End AARM Reason. AARM Ramification: Note that the latter requirement cannot fail for a procedure or entry call alone; there must be at least one function with an in out or out parameter called as part of a parameter expression of the call in order for it to fail. !discussion In order to discuss this topic, we need to look at some examples. type A_Rec is {tagged} record C : Natural := 0; end record; -- Note: We're going to talk about this record both as if it is -- tagged, and then as if it is not tagged and passed by copy. -- Thus the {tagged} here. procedure P1 (Input : in A_Rec; Output : in out A_Rec) is begin Output.C := Output.C + Input.C; end P1; procedure P2 (Bump1, Bump2 : in out A_Rec) is begin Bump1.C := Bump1.C + 1; Bump2.C := Bump2.C + 2; end P2; function F1 (Bumpee : access A_Rec) return A_Rec is begin Bumpee.C := Bumpee.C + 1; return (C => Bumpee.C - 1); end F1; function F2 (Bumpee : in out A_Rec) return A_Rec is begin Bumpee.C := Bumpee.C + 1; return (C => Bumpee.C - 1); end F2; function F3 (Obj : in A_Rec) return Natural is begin return Obj.C; end F3; function F4 (Obj : access A_Rec) return Natural is begin return Obj.C; end F4; function F5 (Bump1, Bump2 : access A_Rec) return A_Rec is begin Bump1.C := Bump1.C + 1; Bump2.C := Bump2.C + 2; return (C => Bump1.C - 1); end F5; function F6 (Bump1, Bump2 : in out A_Rec) return A_Rec is begin Bump1.C := Bump1.C + 1; Bump2.C := Bump2.C + 2; return (C => Bump1.C - 1); end F6; function F7 (Bumpee : in out A_Rec; Bumper : in A_Rec) return A_Rec is begin Bumpee.C := Bumpee.C + Bumper.C; return (C => Bumpee.C); end F6; O1 : {aliased} A_Rec := (C => 1); -- "aliased" only if needed by -- the particular example below. O2, O3 : A_Rec := (C => 1); type Acc_A_Rec is access all A_Rec; A1 : Acc_A_Rec := O1'access; The usual concern about the use of *in out* parameters in functions begins something like: Imagine writing (in a future version of Ada that allows *in out* parameters) an expression like: if F3(O1) = F3(F2(O1)) then This expression has an evaluation order dependency: if the expression is evaluated left-to-right, the result is True (both values have (C => 1) and O1.C is set to 2 afterwards), and if the expression is evaluated right-to-left, the result is False (the right operand is still (C => 1), but now the left operand is (C => 2), and O1.C is still 2 afterwards). This is usually used as a reason to disallow *in out* parameters on functions. If you have to use access parameters, then the expression is: if F3(O1) = F3(F1(O1'access)) then and the use of 'access and aliased on the declaration of O1 should provide a red flag about the possible order dependence. However, this red flag only occurs some of the time. First of all, access objects are implicitly converted to anonymous access types, so no red flag is raised when using them: if F3(A1.all) = F3(F1(A1)) then Perhaps the .all on the left-hand argument could be considered a red flag. But of course that doesn't apply if that function also takes an access parameter: if F4(A1) = F3(F1(A1)) then We have the same order dependency, but there is no sign of a red flag here. All of these calls can be written in Ada 95, but Ada 2005 makes this situation worse by adding prefix views and implicit 'access. If A_Rec is tagged, we can write: if O1.F3 = O1.F1.F3 then And since tagged parameters are implicitly aliased, if O1 is a tagged parameter, there isn't the slightest sign of a red flag here. This shows that we are already in a very deep pit. One can argue whether moving the rest of the way to the bottom is that significant. [Thanks to Pascal Leroy for showing just how bad the current situation is.] We can show similar problems with procedure calls. Consider: P1 (Input => F2(O1), Output => O1); If O1 is tagged (and thus passed by reference), this will set O1.C to 3 in either order of evaluation. (In both cases, Input will have (C => 1) and output will have (C => 2)). But if O1 is not tagged (and thus passed by copy), this will set O1.C to 3 if evaluated left-to-right [F2 sets O1.C to 2, then passes (C => 1) to Input; Output is passed (C => 2); and then that summed to (C => 3)] and O1.C will be 2 if evaluated right-to-left [Output is passed (C => 1); F2 sets O1.C to 2, then passes (C => 1) to Input; and then that is summed to (C => 2)]. We can write similar code in Ada 95: P1 (Input => F1(A1), Output => A1.all); getting the same order dependence. We can also get an order dependence from a single call (even in Ada 83): P2 (O1, O1); but only if there are multiple parameters that can modify an object, and interestingly, only if the parameters are passed by-copy. (If A_Rec is tagged, for example, the value of O1.C will be increased by 3, no matter what order the parameters are evaluated in.) That means that the Ada 95 call: P1 (Input => F5 (A1, A1), Output => O1); cannot have an order dependence, but *in out* parameters: P1 (Input => F6 (O1, O1), Output => O2); could, but only if A_Rec is passed by copy. Note that a single call with only one modifiable parameter cannot have an order dependence: P1 (Input => O1, Output => O1); will always end up with the same result, not matter what order the parameters are evaluated in. (That result could depend on the parameter passing mode, but that is controllable by using parameters of types that are by-copy or by-reference and in any case is not the problem we are discussing here). The parameters will be evaluated before the call; for by-copy they'll both have the value (C => 1) and the result value of (C => 2) will be written after the call. No part of the expression will use the modified value after it is modified, so there cannot be a dependence. Similarly, P1 (Input => F7 (O1, O1), Output => O2); does not have an order dependence, as again, no part of the expression could depend on the modified value of O1. ---- After much analysis (mostly outlined in AI05-0144-1), we settled on the following principles: (1) This problem is more insidious than "ordinary" side effects in functions. A function a routine that uses a side-effect internally ought to have been written so that the side-effect doesn't damage the correctness of the function. (Otherwise, the function could only be called once, which would be unusual.) On the other hand, a side effect occurring in a call is could not be known to the author of the function and may not be known to the author of the call either. This is code that is clearly non-portable, and is likely to break on a different compiler (or different compiler version, or even different optimization settings). That makes it more dangerous than the usual cases we've been living with for years. (2) We must avoid having these checks annoy programmers by rejecting perfectly safe things. Therefore, we will generate errors only when there is a certainty that the result depends on the order of evaluation (when it is arbitrary) and/or the parameter passing mechanism selected (pass by copy/pass by reference). In particular, that means we will not reject programs just because array indexes might be calculated to be the same value. (3) The issue occurs anywhere the language defines an arbitrary order of evaluation (which is most places). The problem could occur just as easily in an aggregate or assignment statement as in a nest of calls. (4) The language does make it very clear that all of the parameters are evaluated (in an arbitrary order), the call is made, and then the parameters are copied back (in an arbitrary order). Mixing parameter evaluations and copies back is not allowed, and that reduces the scope of the problem somewhat. (5) It probably doesn't pay to try to check access values, as they are rarely analyzable, they're effectively reference parameters (which usually are well-defined) and users expect them to be aliased. In addition, checking access parameters would be incompatible. --- Note that there is no generic contract issues with these rules. This is a case where more is allowed (strictly) in an instance when more information (such as about the kinds of types) is available. In the generic, all of these cases would be illegal for generic formal types; the only time things would be legal if if the instance has the "right" actuals (but that's irrelevant since the generic is already illegal). --- The rule about multiple in out parameters in a single call is incompatible, but virtually all programs that would be made illegal would be very dubious. For instance: procedure Do_It (Double, Triple : in out Natural) is begin Double := Double * 2; Triple := Triple * 3; end Do_It; Var : Natural := 2; Do_It (Var, Var); -- Illegal by new rules. Since whether Var contains 4 or 6 after the call to Do_It depends on the compiler version, optimization settings, and potentially the phase of the moon, depending on code like this is just a ticking time bomb. So this check will mostly detect bugs. (Checks against existing code bases showed very few instances of this incompatibility, and most were undetected bugs.) This rule only applies parameters that are certain to be passed by copy. Parameters that are passed by reference have a defined semantics and are not (necessarily) a problem. Since we intend to only reject known-to-be dubious constructs, including types that *might* be passed by copy would be going too far. --- The decision to exclude anonymous access parameters from this checking means that most of the initial examples in fact are still legal (even if insidious). For instance, the Ada 95 example from above: if F4(A1) = F3(F1(A1)) then is not detected by the proposed rules. This is mainly for compatibility reasons: since Ada 95 code could contain these sorts of problems, we don't want to make a lot of it illegal (even if it is dangerous). The main argument used is that functions with access parameters are common (as that was the workaround to not having "in out" parameters). This would also violate our known-to-be dubious rule. Such access parameters are problems only if they are dereferenced, and we cannot know whether the body of the subprogram actually does that. Without such a dereference, there is no problem. (Of course, it is likely that they will be dereferenced.) It is annoying that the existence of that workaround is being used to make it harder to convert to "in out" parameters in those functions (as the result might be illegal while the original code was not -- even though both are equally dubious), but that cannot be helped. !example (See discussion.) !corrigendum 6.2(11) @dinsa For parameters of other types, it is unspecified whether the parameter is passed by copy or by reference. @dinss @s8<@i> Two @fas are @i if: @xbullets statically denote the same stand-alone object or parameter; or> @xbullets are @fas, their @faes are known to denote the same object, and their @fas denote the same component; or> @xbullets are dereferences (implicit or explicit), the dereferenced @fas are known to denote the same object, and both @fas have the same immediately enclosing statement or declaration; or> @xbullets are @fas, their @faes are known to denote the same object, and each of the pairs of corresponding index values are either static expressions with the same value or @fas that are known to denote the same object; or> @xbullets are @fas, their @faes are known to denote the same object, and the two @fas have statically matching index constraints; or @xbullets statically denotes a renaming declaration whose renamed @i@fa is known to denote the same object as the other @fa; or @xbullets are known to denote the same object as a third @fa.> Two @fas are @i if the @fas are known to denote the same object, or if one of the two @fas is known to denote a subcomponent or slice of the object denoted by the others. If a call @i has two or more parameters of mode @b or @b that are of an elementary type, then the call is legal only if: @xbullet @i that is passed as a parameter of mode @b or @b to the call @i, there is no other @fa among the other parameters of mode @b or @b to @i that is known to denote the same object.> If a construct @i has two or more direct constituents that are @fas or @fas whose evaluation may occur in an arbitrary order, at least one of which contains a function call with an @b or @b parameter, then the construct is legal only if: @xbullet that is passed as a parameter of mode @b or @b to some inner function call @i (not including the construct @i itself), there is no other @fa anywhere within a direct constituent of the construct @i other than the one containing @i, that is known to refer to the same object.> For the purposes of checking this rule: @xbullet, an @fa associated with a @fa that has two or more discrete choices, or that has a nonstatic range, is considered as two or more separate occurrences of the @fa;> @xbullet:> @xinbull of a @fa is considered to occur once for each associated component; and> @xinbull for each @fa with <@> for which the associated component has a @fa is considered part of the @fa;> @xbullet evaluated as part of the call is considered part of the call.> !ACATS test !appendix From: Tucker Taft Sent: Sunday, June 7, 2009 7:14 AM I thought about this some more, and tried again. You had generalized your checks to cover assignment statements, but there are a lot of constructs that allow arbitrary order of evaluation (aggregates, constraints, record type elaboration, etc.). I concluded we should focus on the "arbitrary order of evaluation" rather than on calls. I also separated out the handling of elementary in-out and out parameters, as that seems like a pretty different problem (order of copy-back), and it doesn't help to mix it in with the arbitrary order of evaluation stuff. I also limited it to cases of two or more in-out or out parameters to the same call. I don't think inner calls make any difference here. (Your original wording was confusing to me, so you might have not meant to worry about inner calls either.) If we have aliased parameters, then we only care about non-explicitly-aliased elementary in-out/out parameters, but I left out that subtlety for now. I didn't bother talking about the effective "out" parameter mode of the left-hand side of an assignment, as that seems irrelevant to the rules. The problems come with the arbitrary order of evaluation of the LHS vs. the RHS, not with the actual assignment operation itself, which happens after all of the LHS and RHS evaluation is done. Finally, I didn't try to worry about general access-type parameters, but only access parameters with 'Access or 'Unchecked_Access. In any case once you get into access types, you run into all kinds of ways things can go wrong, so I would rather not venture into that can of worms too far. I think we might want to avoid any discussion of parameters of an access type, and just have the one bullet which deals with evaluation order. [This was version /01 of the AI. This was the version discussed at the Brest ARG meeting. This was made as a separate alternative so the original lengthy discussion could be preserved without rewriting it.] **************************************************************** From: Bob Duff Sent: Saturday, June 13, 2009 11:30 AM > Here is the version I am proposing. I agree with your proposal. Mostly editorial comments below (maybe that's premature?). ... > (See wording.) I think a summary of the proposal should go here. Such as: The following rules eliminate the most obvious side effects that can cause evaluation order problems. These rules are checked statically. Unlike most static rules, which are conservative, these rules are liberal in that they do not attempt to prevent all evaluation order problems. Otherwise, one gets lost in the detailed definitions, wondering what the point is. Or else put some similar text *before* the detailed wording, either as an AARM annotation, or a [bracketed introductory paragraph]. > !wording ... > AARM Discussion: This is determined statically. If the name > contains ^^^^ > some dynamic portion other than a dereference, indexed_component, or > slice, it is not "known to denote the same object". [We could also > use the same rules for indexes for the bounds of slices that have > explicit bounds, although it doesn't seem very likely to occur and > the wording is messy.] It's hard to know what "this" refers to above. I suggest moving the AARM annotation above the bullets, and change "This" to "This property". Or else keep it here, and spell it out: "Whether or not names or prefixes are known to denote the same object is determined statically. ..." > Two names N1 and N2 are *known to refer to the same object* if N1 and > N2 are known to denote the same object, or if N1 is known to denote a > subcomponent of the object denoted by N2, or vice-versa. ... > If a construct C has two or more direct constituents that are names or > expressions whose evaluation may occur in an arbitrary order, at least > one of which contains a function call with an in out, out, or > access-to-variable parameter, then the construct is legal only if: "access-to-variable parameter" seems confusing; I think you mean it to include named access types, as well as access parameters. How about "parameter of an access-to-variable type"? > * For each name N that is passed as a parameter of mode in out or out > to some inner function call C2 (not including the construct C > itself), there is no other name anywhere within a direct constituent > of the construct C other than the one containing C2, that is known > to refer to the same object; and > > * For each name N'Access or N'Unchecked_Access that is passed as an > access-to-variable parameter to some inner function call C2 (not > including the construct C itself), there is no other name anywhere > within a direct constitutent of the construct C other than the one > containing C2, that is known to refer to the same object as N. > > For the purposes of checking this rule on an array aggreagate, an aggregate > expression associated with a discrete_choice_list that has two or more > discrete choices, or that has a nonstatic range, is considered as two > or more separate occurrences of the expression. Similarly for a > record aggregate, the expression of a record_component_association is > considered to occur once for each associated component. > > AARM Reason: This prevents obvious cases of dependence on the order of ^^^^ Another dangling "this". > evaluation of names or expressions. Such dependence is usually a bug, > and in any case, is not portable to another implementation (or even > another optimization setting). > > The third bullet does not check for uses of the prefix, since the > access type ^^^^^^^^^^^^ Which third bullet? [Editor's note: The one Tucker deleted. ;-) He apparently didn't update these notes.] > and the designated object are not the same, and "known to denote the > same prefix" does not include dereferences anyway. ... > The usual concern about the use of *in out* parameters in functions > begins something > like: It would be useful to mark each of the following with a comment "-- OK" or "-- ERROR:" showing whether the proposed rules outlaw it. The proposed rules do not outlaw all the cases below, I think -- e.g. the cases that pass A1. Which is OK with me. > Imagine writing an expression like: > > if F3(O1) = F3(F2(O1)) then > > This expression has an evaluation order dependency: if the expression > is evaluated left-to-right, the result is True (both values have (C => > 1) and O1.C is set to 2 afterwards), and if the expression is > evaluated right-to-left, the result is False (the right operand is > still (C => 1), but now the left operand is (C => 2), and O1.C is still 2 afterwards). > > This is usually used as a reason to not allow *in out* parameters on functions. "to not allow" --> "not to allow" or "to disallow" > If you have to use access parameters, then the expression is: > > if F3(O1) = F3(F1(O1'access)) then > > and the use of 'access and aliased on the declaration of O1 should > provide a red flag about the possible order dependence. > > However, this red flag only occurs some of the time. First of all, > access objects are implicitly converted to anonymous access types, so > no red flag is raised when using them: > > if F3(A1.all) = F3(F1(A1)) then > > Perhaps the .all on the left-hand argument could be considered a red > flag. But of course that doesn't apply if that function also takes an access parameter: > > if F4(A1) = F3(F1(A1)) then > > We have the same order dependency, but there is no sign of a red flag here. > > This is all Ada 95 code, ... No, it's not -- I see calls to functions with 'in out' params above. [I'd don't see any such calls other than in a single 'straw man' case.] >...but Ada 2005 makes this situation worse by adding prefix views and >implicit 'access. If A_Rec is tagged, we can write: ... > We can also get an order dependence from a single call (even in Ada 83): > > P2 (O1, O1); > > but only if there are multiple parameters that can modify an object, > and ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ I think I know what you mean, but it's confusing -- parameters don't modify things. [Editor's note: neither do calls or anything else. They just assigned back as needed, apparently by some magical force. :-) Writing two extra sentences to be pedantic about topics such as this don't help understanding.] > interestingly, only if the parameters are passed by-copy. (If A_Rec is > tagged, for example, the value of O1.C will be increased by 3, no > matter what order the parameters are evaluated in.) ... > Survey of solutions It is very useful to have this "Survey" attached to this AI, for posterity! > It fairly obvious that that order dependencies are a problem in Ada, > and "that that" --> "that" > have been getting worse with each version of the language (even > without *in out* parameters in functions). Moreover, it has been used > as the primary reason for leaving something natural and useful (*in > out* parameters for functions) out of the language. ... > One obvious solution would be to define the order of evaluation, > eliminating the problem at the source. Java, for instance, requires > left-to-right evaluation. However, that would encourage tricky code > like the various examples by making it portable. I doubt if I'll convince anyone, but I think that's a bogus argument. People do, in fact, depend on eval order all the time, either by accident, or because they don't know the language rules. And there's nothing any language definition can say to stop it. A language definition can say, "It is considered bad style to depend on evaluation order (of...). Don't do that." That would have just as much effect as leaving the order undefined -- i.e. it puts people on notice, but doesn't entirely stop the problem. >... Moreover, Ada compilers have been using this flexibility for >decades; trying to remove it from compilers (particularly from >optimizers) could be very difficult. Note that this is the only >solution that actually could eliminate dependencies on side-effects inside of functions. > But defining the order of evaluation was considered for both Ada 83 >and Ada 95 and was deemed not worth it -- it's hard to see what has changed. > > Another option would be to increase the visibility of parameters with > side-effects. This sounds somewhat appealing (after all, it seems to > be the basis on which access parameters are deemed OK and *in out* > parameters are not). One possibility would be to add ordering symbols to named > notation: Param <- for an *in* parameter (this includes access); Param > -> for an *out* parameter; and Param <-> for an *in out* > parameter. Ada 80 (or thereabouts) used the symbols :=, =:, and :=: for this. I suggest you use this notation (before shooting it down below). And eliminate the "That's especially bad..." sentence below. > However, for compatibility, old code that don't use the symbols would > have to be allowed. That's especially bad because the symbol for > ordinary named parameters (=>) looks like the symbol for an *out* > parameter; while it usually will be an *in* parameter. Moreover, this > solution does nothing for positional parameters in calls nor for the > prefixes of prefix notation. And it is misleading for access > parameters, whose mode is officially "in", but still might cause side-effects. > > One could argue that positional parameters are already unsafe and > requiring named notation to be safe is not much of an imposition. But > the prefix and access issues are not so easily explained away. > Additionally, putting the mode into calls in some way makes > maintenance of programs harder: changing the mode of a call is going > to make many calls illegal, while today most calls will remain legal (all will > if the mode is changed from "in out" to "in"). I think that last part is bogus -- it's like saying the full coverage rules for aggregates make maintenance harder. > Another syntax suggestion that was made recently was to (optionally) > include the parameter mode as part of the call. That would look something like: > > if F3(in O1) = F3(in F2(in out O1)) then Shirley, you'd leave out the 'in's! > This could be applied to positional calls as well, but still provides > no help for prefix calls nor for access parameters. > > One could imagine requiring the syntax for calls of functions with *in > out* parameters and making it optional elsewhere. That might placate > *in out* parameter opponents, but otherwise doesn't seem to do much for the > language. If we were to do any of the above "marking [in]out params" syntax, we should also define a Restriction that forces it on all calls (not 'in' params, of course). Worth mentioning, even though we're not going this route. > Finally, we come to some sort of legality rules and/or runtime checks > for preventing such order dependencies. It is important to note that > making such rules too simple (and strong) only would mean that > temporaries have to be introduced in some expressions. That would be > an annoyance, but surely not as bad as the current ticking time-bomb. > > The easiest option is to blame all of the problems on functions with > *in out* parameters and make them stand alone. The rule would be something > like: > > A call of a function with an *in out* parameter must be the only call in > an expression. > > That would mean that > > if F2(O1) then > > would be legal (assuming F2 returned type Boolean), but > > if F2(O1) = (C => 1) then > > would not be. Obviously, this is too strict. I agree it's too strict, but I don't think it's "obviously" too strict. It allows: Blah : T := Func(...); which is one of the more useful cases. Especially when T is something like String. >...Amazingly, it also not strict enough: > > Some_Array(F3(O1)) := F2(O1); > > would be allowed. (An assignment statement is *not* an expression!) Well, then obviously the wording of the rule should not be "expression". It should be something like (in this rejected alternative): If a construct has two or more constituents whose evaluation may occur in an arbitrary order, and contains a call to a function with an [in]out param, then it shall contain no other calls. > A call of a function with an *in out* parameter must be the only call in > an expression or statement; > If a call of a function with an *in out* parameter is the source expression > of an assignment_statement, the target variable_name shall not include a > call, an indexed component, a slice, or a dereference. ... > So we could modify the definition of "potentially the same object" to: > > Two objects are considered to be "potentially the same object" one has > the type if ** > of a part of the other, or one has a part whose type is a partial view, unless: > * one object is part of a stand-alone object, and the other object is part ... > Such rules would only reject calls where it is clear that parts of the > same object are involved. That eliminates the complications of private > types (we won't look in them), arrays (we won't try to determine if they are > the same), and so on. "arrays" --> "array components", I think you mean. > These are the rules proposed in the !wording section above. ... > None of the proposed rules do anything about side-effects totally > inside of functions. One way to deal with that would be to require an > expression to contain only a single function call unless all of the functions are *strict*. I don't think "strict" is the right term, here. In computer science, "strict" means "arguments are evaluated at the call", as opposed to "lazy" and "non-strict" (which are almost, but not quite the same thing -- Haskell is non-strict), and "call by name". I'd consider extending "pure" to this concept. Or else don't define a term, just talk about "such functions" in this part. [Editor's note: I tried that and people hated it. A new term seemed better. Doesn't matter, because we're not doing any of this.] >... A strict > function exposes all of its side effects in its specification, meaning >it does not read or write global variables, call non-strict functions, >write hidden parts of parameters, etc. Most of the language-defined >functions are strict (that would have to be declared somehow). > > Strict functions have several other nice properties: they don't need > elaboration checking (freezing is sufficient to prevent > access-before-elaboration problems); they can be used in symbolic > reductions; and they can use the optimizations allowed for functions in pure packages. > > However, such a requirement would be quite incompatible. Moreover, > strict functions would be rather limiting by themselves. > > An alternative that has been suggested is to allow Global_In and > Global_Out annotations on subprograms, which would declare the global > side-effects of a subprogram. Such annotations could not be a lie > (they'd have to be checked in some way), and thus would fill the role > of strict functions more flexibly. But it would still be too > incompatible to ban dangerous side-effects in functions (although separate tools > or non-Ada operating modes could make such checks). Or pragma Restrictions. Same applies to the "strict" discussion above. **************************************************************** From: Tucker Taft Sent: Saturday, June 13, 2009 4:36 PM Thanks for reviewing this. We approved the intent, after deleting the bullet relating to access parameters, and generalizing the copy-back rules to apply to not only elementary types, but in fact any type that is not tagged or immutably limited. Basically, unless it is bloody obvious/guaranteed the type will be passed by reference, then we disallow having two out parameters that denote the same object in a single call. We didn't really review the discussion or examples, but it sounds like they will need some work. **************************************************************** From: Edmond Schonberg Sent: Tuesday, February 23, 2010 8:33 PM I can report on an unfinished experiment concerning the detection of dangerous order dependences. a) I implemented the check on multiple in-out parameters in a procedure, where the actuals of an elementary type overlap. In the 15,000 tests in our test suite I found two occurrences of P (X, X) or near equivalent. One of them (in Matt Heaney's code!) appears harmless. The other one is in a program full of other errors, so unimportant. So application of this rule should not break anything. b) Using the rules proposed in AI-0144-1 I checked for overlap between actuals in inner calls, when the formal is an access type (didn't implement in-out parameters for functions, there would be no code to test :-)). Here there is a problem, I can't even compile the GNAT library. For example, the following innocent code (from ada.containers.red_black_trees.generic_operations) violates the rule: if Parent (X) = Left (Parent (Parent (X))) then Here X is of an access type. Of course Parent is a pure function, but the compiler doesn't know that at this point, and it looks like an order dependence. So getting rid of the rule about access types (as done in 144-2) seems indispensable. To be continued. **************************************************************** From: Robert Dewar Sent: Wednesday, February 24, 2010 4:58 AM ... > a) I implemented the check on multiple in-out parameters in a > procedure, where the actuals of an elementary type overlap. In the > 15,000 tests in our test suite I found two occurrences of P (X, X) > or near equivalent. One of them (in Matt Heaney's code!) appears > harmless. The other one is in a program full of other errors, so > unimportant. So application of this rule should not break anything. I guess I can tolerate this rule, of course Ed's experiment also shows that it is almost certainly useless, so just another case of forcing compiler writers to waste time on nonsense. > b) Using the rules proposed in AI-0144-1 I checked for overlap > between actuals in inner calls, when the formal is an access type > (didn't implement in-out parameters for functions, there would be no > code to test :-)). Here there is a problem, I can't even compile the > GNAT library. For example, the following innocent code (from > ada.containers.red_black_trees.generic_operations) violates the rule: > > if Parent (X) = Left (Parent (Parent (X))) then > > Here X is of an access type. Of course Parent is a pure function, but > the compiler doesn't know that at this point, and it looks like an > order dependence. So getting rid of the rule about access types (as > done in 144-2) seems indispensable. I strongly dislike all this order dependence stuff, but for sure flagging the above line would be nonsense. **************************************************************** From: Randy Brukardt Sent: Wednesday, February 24, 2010 7:21 PM For the record, we've already abandoned this particular rule for access types. And this example makes it clear that's a good thing. The currently proposed rule (b) only applies to in out and out parameters of functions, so it can't possibly have a compatibility problem. (And Ed's experiment shows that rule (a) isn't significantly incompatible.) So I think we're good to go on this. As previously noted, the primary purpose of these rules is to cut the Gordian knot of opposition to allowing "in out" parameters on functions. I'll take some work for that benefit (that we should have had in Ada 95, IMHO). **************************************************************** From: Erhard Ploedereder Sent: Friday, February 26, 2010 10:41 AM Here is part I of my homework on aliasing. The following are details of a few aliasing situations, referring to files in big archives, which I can distribute at the meeting (gnat.tar has about 16 MB, the alias_pl.tar has 1 MB). I did not want to distribute the archives by e-mail. =================================== in Gnat, version ca 1999, see archive gnat.tar file osint.adb : aliaspair: Osint.locate_File and NameT.name_Buffer aliases caused at lines 795,804,820 calls on Locate_File alias effect in Locate_'File, writing to Name_buffer (line 760) alias origin in namet.ads, line 121 --------------------- file sem_ch4.adb : aliaspair: sem_ch4.analyse_membership_op.t.typ and sem_ch4.analyse_membership_op.Try_one_interp.T1 line 1617 bur note full assignment in line 1577 aliaspair: sem_ch4.find_comparison_types.it.typ and sem_ch4.find_comparison_types.it.Try_one_interp.T1 lines 3456 and assignment in line 3407 also lines 3602 and 3543 -------------------- file par_ch4.adb aliaspair: scans.token_ptr and par.ch4.bad_range_attribute.loc aliases caused by calls of Bad-Range_attribute in lines 1468, 1556, 1992 also relevant par-sync.adb (lines 163) and scans.ads (lines 326) see also scn.abd line 433 ======================= for the following 2 cases, see alias_pl.tar file: alias_pl/adabasis/Stubber/alias.txt aliaspair: Gettoken.get_token.token, gettoken.get_token.add_on.sT many calls on GetToken ----------------- file: alias_pl/scannergenerator aliaspair: Str_Tree.Calculatons.Replace_Node.Node and Str_Tree.Calculatons.Replace_Node.New_Node aliases in many calls out of Str_Tree.Sem_checks **************************************************************** From: Erhard Ploedereder Sent: Friday, February 26, 2010 10:46 AM Here is an unusual (non-)aliasing situation. Nasty, but possibly not relevant. ===================== cut here ================== -- this code is a text-book implementation of deletion from a sorted binary -- search tree. As far as I know, the code was originally written by Niklaus -- Wirth in Pascal or Modula. The code here is a literal translation into -- Ada, plus English identifiers, and a slightly "de-nested" structure. -- It exhibits a rather nasty problem.... -- At the marked place, the original code relied on VAR parameters being -- passed by reference. If aliasing occured, the code produced the "right" -- effect of a "noop", albeit in a very screwy way. Passing the parameters -- by value-result corrupts the tree into a cyclic structure, if the first -- parameter is the subTreeLeft component of the target of the second -- parameter. -- (The general correction is to not execute the 2 lines when -- -- Subtree = Temp.SubtreeLeft) -- In the original code, Tree was an uplevel-addressed variable rather than -- a parameter. -- It would have been nice if some diagnostics had identified the -- aliasing situation. (It is not your run-of-the-mill aliasing situation, -- though.) procedure test_wirth is type Node; type NodePtr is access Node; type Node is record Content : ElementTyp; SubtreeLeft, SubtreeRight : NodePtr; end record; procedure Remove (Subtree: in out NodePtr; Tree: in out NodePtr) is Temp: NodePtr; begin if Subtree.SubtreeRight = null then Temp := Tree; Tree := Subtree; ------------ -- the following two lines malfunction badly, if Subtree is Tree.SubtreeLeft, -- that is, for a call Remove(Tree.SubtreeLeft, Tree) -- the original code relied on the fact that VAR parameters are passed -- by-ref; it then produces the right effects (a "noop" in a screwy way) -- to fix, simply execute these lines conditionally Subtree := Subtree.SubtreeLeft; Tree.SubtreeLeft := Temp.SubtreeLeft; ------------ Tree.SubtreeRight := Temp.SubtreeRight; Free(Temp); else Remove (Subtree.SubtreeRight, Tree); end if; end Remove; procedure Delete ( Key : in KeyTyp; OK : out Boolean) is procedure RecDelete (Tree: in out NodePtr) is begin -- RecDelete if Tree = null then OK := False; else case StringComp (Key, Tree.Content.Key) is when Before => RecDelete (Tree.SubtreeLeft); when After => RecDelete (Tree.SubtreeRight); when Equal => if (Tree.SubtreeLeft = null) or else (Tree.SubtreeRight = null) then declare Temp: NodePtr := Tree; begin if Tree.SubtreeRight = null then Tree := Tree.SubtreeLeft; else Tree := Tree.SubtreeRight; end if; Free(Temp); end; else Remove (Tree.SubtreeLeft, Tree); end if; OK := True; end case; end if; end RecDelete; begin -- Delete RecDelete (BigTree); end Delete; begin null; end test_wirth; **************************************************************** From: Bob Duff Sent: Tuesday, March 2, 2010 7:39 PM > Here is part I of my homework on aliasing. What was your homework assignment? And what does the stuff below attempt to prove? I'm puzzled... I'm not sure what "aliaspair", "aliases caused", "alias effect", and "alias origin" mean, exactly. Name_Buffer is a (very) global variable. I have fixed at least one bug related to that fact! ;-) What tool is producing this information? ***************************************************************** From: Erhard Ploedereder Sent: Wednesday, March 3, 2010 12:27 PM > What was your homework assignment? > And what does the stuff below attempt to prove? > I'm puzzled... I'm not sure what "aliaspair", "aliases caused", > "alias effect", and "alias origin" mean, exactly. I had a homework to dig out the results of a study that we did 10 years ago about aliasing in real Ada code. The tool used is an implementation of Cooper-Kennedy, with a better filter than they used on visibility. Aliaspairs are names that denote the same object AND, for this study, cause an aliasing effect, i.e., reading via one name and writing via the other. Aliases are caused whereever the second name starts its scope (origin). Sometimes they are caused by propagation. The premise, if I remember correctly, was that all "[in] out" parameters were seen as being at risk (this is not quite true). Still, the number of aliases caused by explicit names (CK does not deal in pointer aliasing) were about 1 in 10.000 lines of code. The samples in the mail were situations that we could recover 10 years later. Without the .tar files, indeed mostly meaningless info, unless you can re-locate the situations in today's GNAT code. Question is: is it worth introducing the subsetted "no aliasing" check - if the number of aliases is so small - if the check might not even catch them For the later purpose, these samples were intended to help evaluate whether the checks would indeed catch them or how they could be amended to do so. ***************************************************************** From: Bob Duff Sent: Wednesday, March 3, 2010 1:17 PM > I had a homework to dig out the results of a study that we did 10 > years ago about aliasing in real Ada code. Thanks for the info. Interesting results (now that I understand what they mean). > Question is: is it worth introducing the subsetted "no aliasing" check > - if the number of aliases is so small > - if the check might not even catch them I don't know, but to me, the primary purpose of those checks is to keep Tucker happy -- always a worthy goal. ;-) ***************************************************************** From: Tucker Taft Sent: Wednesday, March 3, 2010 1:24 PM The current proposed checks are of two kinds: 1) Two by-copy [IN] OUT parameters of the same call that denote the same object. That is clearly weird since there is no defined ordering for the copy-back for these. 2) Two actual parameters that denote the same object occuring within a single expression or simple statement, where at least one is an [IN] OUT parameter of a function call, and it is not specified by the language rules whether that function call will occur before or after the evaluation of the call with the other parameter. Because (2) necessarily involves an [IN] OUT parameter of a function call, there is no test on existing Ada code that can determine how often these will occur. Ed verified that (1) happens very rarely in current Ada code, so there is no danger of significant incompatibility from (1). You might argue that it is unneeded, but it is an inexpensive check, and it also applies to the new functions with [IN] OUT parameters, so it will help users avoid doing clearly meaningless things with those. My own belief is that (2) will occur with some regularly in code involving calls on functions with [IN] OUT parameters, so it is an important check to help preserve the overall safety level of Ada, as we add [IN] OUT parameters to functions. In my view, there is nothing more important than preserving the overall safety level of Ada going forward. ***************************************************************** A private back and forth between Steve Baird and Randy Brukardt regarding the wording for this AI: Apr 30-May 7, 2010: Steve: Oh dear, heat vision seems to have ignited something. Consider type Ref is access Some_Type; Ptr : Ref := new Some_Type'(...); X : Some_Type renames Ptr.all; begin Ptr := new Some_Type'(...); P (Func_With_Out_Params (Ptr.all, X)); The addition of renames into the mix has introduced the possibility that two different evaluations of "P.all" may yield references to different objects. We could address this by adding something to the "both names are dereferences" bullet along the lines of "both have the same immediately enclosing statement or declaration" . Even this isn't quite right because of short circuit evaluation of functions with side effects. I don't know; what do you think? Thinking about this some more, I'm not sure that there is an issue here. I think the "same enclosing blah-blah" rule might actually be good enough. Randy: OK, I'll add that (and your example in an AARM note to show what is meant). [Editor's Note: This was the state at the point of version /06. But then...] Steve: > * both names are known to denote the same object as a third name. > > AARM Reason: "Known to denote the same object" is intended to be an > equivalence relationship, that is, it is symmetric, reflexive, and > transitive. This last bullet is needed to make the relationship transitive. I don't believe this new bullet is needed. I think it is redundant (and a note about transitivity might be appropriate). > For instance, given the following declarations: > > S : String(1..10); > ONE : constant Natural := 1; > R : Character renames S(1); > > the names R and S(1) are known to denote the same object by the sixth > bullet, and S(1) and S(ONE) are known to denote the same object by the > fourth bullet, but we need the last bullet for R and > S(ONE) to be known to denote the same object. > END AARM Reason. As you pointed out, the names R and S(1) are known to denote the same object, as are S(1) and S(ONE), So reapplying the rename rule to the names R and S(ONE), we get the desired conclusion about those two names without any new bullet. Right? Randy: You might be right that recursive application will do the trick. It's hard to wrap ones mind around, that's for sure. In any case, we need to talk about this once we have rules that actually no longer have holes. The fix(es) could destroy this property. Steve: > Two names are *known to refer to the same object* if the names are > known to denote the same object, or if one of the two names is known > to denote a subcomponent or slice of the object denoted by the others. > > This is actually shorter. > Typo: "others" => "other". The "Known to denote the same object" relationship is a relation on names. When you talk about "known to denote a subcomponent of", I think you are implicitly conjuring up nonexistent names, as in two names "are known to refer to the same object" if I could conjure up zero or more additional selectors and tack them on the end of one of the names in order to produce a name which is known to denote the same object as the other name. I think I'd prefer something like ... or if a prefix of one of the two names is known to denote the same object as the other name. but then we need to worry about renames. Given R : T renames X.Field; we'd like to say that the names R and X are known to refer to the same object, so we need something to get the effect of "or if one of the names denotes a rename whose renamed object_name is known to refer to the same object as the other name" (this is not intended to be precise wording). Randy: > The "Known to denote the same object" relationship is a relation on > names. When you talk about "known to denote a subcomponent of", I > think you are implicitly conjuring up nonexistent names, as in > > two names "are known to refer to the same object" if I could > conjure up zero or more additional selectors and tack them on > the end of one of the names in order to produce a name which > is known to denote the same object as the other name. I suppose you can think of it that way. It doesn't sound good... > I think I'd prefer something like > > ... or if a prefix of one of the two names is known to denote > the same object as the other name. That's too broad. Prefixes are used in cases other than subcomponents and slices, such as attribute references and subprogram calls. If F is a function, we surely don't want X and X.F to be known to refer to the same object. (Not to mention dereferences). It seems to me that handling this properly will get *really* messy. > but then we need to worry about renames. Given > > R : T renames X.Field; > > we'd like to say that the names R and X are known to refer to the same > object, so we need something to get the effect of > "or if one of the names denotes a rename whose renamed > object_name is known to refer to the same object as the > other name" > (this is not intended to be precise wording). Here being another reason this idea is nasty. Want to try again closer to the original wording?? The idea of "conjuring" selectors, indexes, or slices is a lot simpler, and it is obvious to the compiler what is needed (if anything). Steve: > Want to try again closer to the original wording?? The idea of "conjuring" > selectors, indexes, or slices is a lot simpler, and it is obvious to > the compiler what is needed (if anything). You are right that Prefix is too general an idea to use here. How about a more constrained, specific but still prefix-based approach? ==== Two names are known to refer to the same object if The two names are known to denote the same object; or One of the names is a selected_component, index_component, or slice and its prefix is known to refer to the same object as the other name; or One of the two names statically denotes a renaming declaration whose renamed object_name is known to refer to the same object as the other name. ==== Randy: I said yesterday: > I was worried that indexed_components also would have a similar > problem to dereferences, but I think we're OK because renames is > essentially by-reference, and indexes have to be discrete. A constant > would be a problem, but since it is technically a different object, it > wouldn't hit under these rules. > > S : String(1..10); > I : Natural := 5; > R : Natural renames I; > begin > I := 3; > P (S(I), S(R)); -- OK, both I and R are 3 here. I wasn't clever enough yesterday. Try: S : String(1..10); I : Natural := 5; R : Character renames S(I); begin I := 3; P (S(I), R); -- !!! Here, the bullets we currently have make S(I) and R "known to denote the same object": but S(I) = S(3) and R = S(5)! Those can't possible denote the same object because they're different objects! Fixing this seems difficult; we could just drop the part about the indexes being known to denote the same object, but that is nasty because it means that S(I) is not known to denote the same object as S(I). (Which is going to allow a lot of errors through.) Else we have to add the same scope rule that you have for dereferences. That's also ugly. A final idea is to apply that scope rule more generally. That is, make the statically denotes bullet require the same scope of the reference. Then I think we wouldn't need that rule for dereferences or for indexed components. Note that some of these fixes endanger the sense of equivalence here. Steve: Good point. I tried making use of the "same evaluation" wording that is used in 4.9.1 (the phrase is also used in 4.5.2, but that doesn't help because it is dynamic semantics). Let's try replacing * both names are indexed_components, their prefixes are known to denote the same object, and each of the pairs of corresponding index values are either static expressions with the same value or names that are known to denote the same object; or with * both names are indexed_components, their prefixes are known to denote the same object, and each of the pairs of corresponding index values are statically known to have the same value. But of course, now we need something like Two discrete expressions are statically known to have the same value if * they are static expressions with the same value; or * both expressions are names and the two names are known to denote the same constant object; or * Constructs to watch out for here include factored lists X, Y : constant Integer := Function_Call; -- X and Y should not be statically known to have the same value and constant views of non-constants X : aliased Integer; Y : constant access constant Integer := X'Access; Z1 : constant Integer := Y.all; X := X+1; -- somehow get this stmt into the decl list Z2 : constant Integer := Y.all; -- Z1 and Z2 should not be statically known to have the same value , but I think these cases fall out (I mention them because this is still up in the air, so we just want to remember to check that they are handled right). Does this seem like a reasonable approach (i.e., worth fleshing out the details)? Randy: Yes, but it doesn't handle the case that was the original motivation for including the same objects in the first place: Swap (S(I), S(I)); It wouldn't be very helpful if this case wasn't detected, especially as it represents a bug (either nothing is needed or S(J) was meant for the second parameter). I was wondering if there was a problem with the above having to do with functions with "in out" parameters, but I think those would be illegal by other checks. That is, if we have function Bump (I : in out Natural) return Natural is begin I := I + 1; return I - 1; end Bump; In: P (S(I), Bump(I), S(I)); We don't know if the two S(I)'s are the same object or not. But this is already illegal (I hope! It's the classic problem) so it is irrelevant. You could do something similar with side-effects, but since the results would not be portable, it's OK to reject the call (this is the same reason that we use to justify rejecting the explicit problem). So I think we would need a bullet similar to the one for dereferences. * The expressions are both object_names, the names are known to denote the same object, and both names have the same immediately enclosing statement or declaration; Thoughts? Steve: > Yes, but it doesn't handle the case that was the original motivation > for including the same objects in the first place: > > Swap (S(I), S(I)); > Good point. > > So I think we would need a bullet similar to the one for dereferences. > > * The expressions are both object_names, the names are known to > denote the same object, and both names have the same immediately > enclosing statement or declaration; > > Thoughts? Looks good. Would you want to relax the "same immediately enclosing" rule in the the case of a constant? The idea that you are trying to capture here is that the two evaluations of this name will yield the same value and that ought to be true for a constant regardless of where the two names occur (although it might not be true for a constant view of a variable; but down this road lies madness, or at least complexity. Do we really want to worry about the fact that a "constant"'s finalization can modify it?). Randy: Maybe, but I think it would have to be a real discrete constant (a stand-alone constant), rather than a constant component of a composite type (or a constant view of a variable), as those can change via other channels. Not sure it is worth the effort because of that limitation (and because if the constant is considered static, it is already covered anyway). You could conceive of handling expressions of predefined operators as well: C : constant := 1; Swap (S(I+1),S(I+C)); * Both expressions are calls to the same predefined operator, and the corresponding operands of the expressions are statically known to have the same value. This of course can quickly become a slippery slope. (How about selected attributes? And on and on and on...) Steve: I think I like your approach of limiting this to top-level discrete constants, avoiding anything to do with composite or access types. Your exploration of the slippery slope associated with more general approaches has convinced me that we don't want to go there. Randy: OK. Please propose some wording (to be part of your editorial review, of course, and to be reviewed during our phone call on May 20). Steve: Do we now only need wording for this new "statically known to have the same value" (today, I think I prefer "statically known to be equal") relationship between two discrete expressions ? Randy: I think that is it. You had: * both names are indexed_components, their prefixes are known to denote the same object, and each of the pairs of corresponding index values are statically known to have the same value. And that looks OK (modulo whatever term you choose). You also were trying to convince me that we don't need a transitivity rule. I'm still a bit dubious, but you might be right (I haven't been able to figure out a problem case yet, although it seems that finding the right intermediate name might be difficult in some cases). And then I believe you wanted to change the "refer to" definition: Two names are known to refer to the same object if The two names are known to denote the same object; or One of the names is a selected_component, indexed_component, or slice and its prefix is known to refer to the same object as the other name; or One of the two names statically denotes a renaming declaration whose renamed object_name is known to refer to the same object as the other name. I think this is OK as well. Steve: I don't remember if we talked about "enclosing statement or declaration" anywhere else, but it should probably be replaced with "immediately enclosing complete context" if we did (consider a pragma). I've made that replacement below. We need to be consistent with regard to "known" vs. "statically known" in defining these various new terms. I'm happy with whatever is consistent with existing wording. In the case of this new predicate (below), I realize that it is so far from being bulletproof that I don't like using the word "known" at all. Hence "very likely". I changed from "statically known" to "very likely" when I starting thinking about if Function_With_In_Out_Params (X(I), Function_Call_That_Modifies_I, X(I)) then or the various cases involving modification of constants (finalization routines, Current_Instance_Of_A_Limited_Type'Access, pragma Import). If we are really only talking about a 90% heuristic, then let's not pretend otherwise. What do you think? ==== Given two names or expressions of the same discrete type, one is *very likely to have the same value* as the other if * both are static expressions and their values are the same; or * both are names and the two names are statically known to denote the same object and that object is a constant object; or AARM note: A dereference of an access-to-constant value denotes a constant view of a potentially variable object, not a constant object. * both are names and the two names are statically known to denote the same object and both names have the same immediately enclosing complete context (see 8.6); or * one of the two is a name which statically denotes a renaming declaration whose renamed object_name is very likely to have the same value as the other name name or expression; or * one of the two is a name which statically denotes a non-deferred constant object whose initialization expression is very likely to have the same value as the other name or expression. Randy: > We need to be consistent with regard to "known" vs. "statically known" > in defining these various new terms. > I'm happy with whatever is consistent with existing wording. I think we used "statically known" to make it clear that we are talking about compile time, not whether A(I) and A(J) are the same at runtime. I can't think of any existing wording off-hand other than that in this clause. > In the case of this new predicate (below), I realize that it is so far > from being bulletproof that I don't like using the word "known" > at all. Hence "very likely". > > I changed from "statically known" to "very likely" when I starting > thinking about > > if Function_With_In_Out_Params > (X(I), Function_Call_That_Modifies_I, X(I)) then > > or the various cases involving modification of constants (finalization > routines, Current_Instance_Of_A_Limited_Type'Access, pragma Import). > If we are really only talking about a 90% heuristic, then let's not > pretend otherwise. I think "very likely" is going to turn people *way* off. Moreover, it isn't that hard to close the vast majority of these loopholes: restrict the constant object case to elementary objects. That's what we did in 3.3(13/3), after all, for these very same reasons. Moreover, for the intended use, we really only care about elementary constants anyway. And I know I suggested that a few days ago. And I rationalized ignoring the "Function_Call_That_Modifies_I case" in a previous message. The use of these rules is to find cases where the undefined order-of-evaluation could cause differing/non-portable results. The above *surely* is such a case. Similarly, if some uses an address clause or something like that to modify values under the covers, they've got problems anyway: the code is insanely tricky. It's OK to blow them away. :-) So I think it is OK to call it "known", although we might want a AARM To Be Honest note that suggests care if someone ever tries to use this definition elsewhere, because it is not bulletproof (although it is in any reasonable program). ... > * one of the two is a name which statically denotes > a non-deferred constant object whose initialization expression > is very likely to have the same value as the other name > or expression. I'd stick with "known", and add the requirement to be elementary in the two constant object rules. Otherwise, looks fine to me. Steve: > I think "very likely" is going to turn people *way* off. I agree, but only because it is an accurate description. If we called it "known", then people won't be turned off until they figure out that this is a case of false advertising. Still, I agree with you - when we start defining technical terms with names like "as near as I can tell", it does seem that we've taken a wrong turn somewhere. > Moreover, it isn't > that hard to close the vast majority of these loopholes: restrict the > constant object case to elementary objects. I think the wording I sent does that. In which case I shouldn't have mentioned finalization as one of the FUD-sources regarding the value of a constant; ditto for Rosen-tricks and the like. So you are right. > That's what we did in 3.3(13/3), > after all, for these very same reasons. Moreover, for the intended > use, we really only care about elementary constants anyway. And I know > I suggested that a few days ago. Repeating myself, I think I got the wording right with respect to this issue - it was just my color commentary that needed to be toned down. > And I rationalized ignoring the "Function_Call_That_Modifies_I case" > in a previous message. The use of these rules is to find cases where > the undefined order-of-evaluation could cause differing/non-portable results. > The above *surely* is such a case. > Similarly, if some uses an address clause or something like that to > modify values under the covers, they've got problems anyway: the code > is insanely tricky. It's OK to blow them away. :-) You're not too concerned about an imported constant, even if it is subject to a Volatile pragma? I guess I can live with that. Especially when a user can work around any problems in this area by adding a set of parens. If we reject F (A(I), A(I)); , I think we'll still accept F (A ((I)), A(I)); Ada is generally a very readable language, but when you wander off into the corner cases, it can get weird. > So I think it is OK to call it "known", although we might want a AARM > To Be Honest note that suggests care if someone ever tries to use this > definition elsewhere, because it is not bulletproof (although it is in > any reasonable program). Agreed. Randy: > You're not too concerned about an imported constant, even if it is > subject to a Volatile pragma? I guess I can live with that. Hadn't thought of that one. But of course B.1(38.1) sort of saves us: Ada semantics is that the value doesn't change. If it *does* change, the program is erroneous. And of course, the result is an error message that isn't accurate -- but the code is very tricky if they meant to depend on that, and if they didn't, well then they have a bug. > Especially when a user can work around any problems in this area by > adding a set of parens. If we reject > > F (A(I), A(I)); > , I think we'll still accept > F (A ((I)), A(I)); > > Ada is generally a very readable language, but when you wander off > into the corner cases, it can get weird. That seems like a bug, but it isn't worth trying to fix it. After all, the purpose of these rules was to make Tucker happy. He seems happy now, let's not disturb his slumber. ;-) [Editor's note: If you are still reading this at this point, you are at least as crazy as Steve and I. You deserve a libation of your choice. :-)] *****************************************************************