Version 1.2 of ai05s/ai05-0144-1.txt

Unformatted version of ai05s/ai05-0144-1.txt version 1.2
Other versions for file ai05s/ai05-0144-1.txt

!standard 6.04 (09)          09-02-17 AI05-0144-1/01
!class Amendment 09-02-15
!status work item 09-02-15
!status received 09-02-15
!priority High
!difficulty Hard
!subject Detecting dangerous order dependencies
!summary
** TBD **
!problem
Ada does not define the order of evaluation of parameters in calls. This opens up the possibity that function calls with side-effects can be non-portable. The problem gets worse with "in out" and access parameters, as proper evaluation may depend on a particular order of evaluation.
Arguably, Ada has selected the worst possible solution to evaluation order dependencies: it allows such dependencies (by not specifying an order of evaluation), does not detect them in any way, and then says that if you depend on one (even if by accident), your code will fail at some point in the future when your compiler changes.
Something should be done about this.
!proposal
** TBD **
!wording
!discussion
In order to discuss this topic, we need to look at some examples.
type A_Rec is {tagged} record C : Natural := 0; end record; -- Note: We're going to talk about this record both as if it is -- tagged, and then as if it is not tagged and passed by copy. -- Thus the {tagged} here.
procedure P1 (Input : in A_Rec; Output : in out A_Rec) is begin Output.C := Output.C + Input.C; end P1;
function F1 (Bumpee : access A_Rec) return A_Rec is begin Bumpee.C := Bumpee.C + 1; return (C => Bumpee.C - 1); end F1;
function F2 (Bumpee : in out A_Rec) return A_Rec is begin Bumpee.C := Bumpee.C + 1; return (C => Bumpee.C - 1); end F2;
function F3 (Obj : in A_Rec) return Natural is begin return Obj.C; end F3;
function F4 (Obj : access A_Rec) return Natural is begin return Obj.C; end F4;
O1 : {aliased} A_Rec := (C => 1); -- "aliased" only if needed by -- the particular example below.
type Acc_A_Rec is access all A_Rec; A1 : Acc_A_Rec := O1'access;
The usual concern about the use of in out parameters in functions begins something like:
Imagine writing an expression like:
if F3(O1) = F3(F2(O1)) then
This expression has an evaluation order dependency: if the expression is evaluated left-to-right, the result is True (both values have (C => 1) and O1.C is set to 2 afterwards), and if the expression is evaluated right-to-left, the result is False (the right operand is still (C => 1), but now the left operand is (C => 2), and O1.C is still 2 afterwards).
This is usually used as a reason to not allow in out parameters on functions. If you have to use access parameters, then the expression is:
if F3(O1) = F3(F1(O1'access)) then
and the use of 'access and aliased on the declaration of O1 should provide a red flag about the possible order dependence.
However, this red flag only occurs some of the time. First of all, access objects are implicitly converted to anonymous access types, so no red flag is raised when using them:
if F3(A1.all) = F3(F1(A1)) then
Perhaps the .all on the left-hand argument could be considered a red flag. But of course that doesn't apply if that function also takes an access parameter:
if F4(A1) = F3(F1(A1)) then
We have the same order dependency, but there is no sign of a red flag here.
This is all Ada 95 code, but Ada 2005 makes this situation worse by adding prefix views and implicit 'access. If A_Rec is tagged, we can write:
if O1.F3 = O1.F1.F3 then
And since tagged parameters are implicitly aliased, if O1 is a tagged parameter, there isn't the slightest sign of a red flag here.
This shows that we are already in a very deep pit. One can argue whether moving the rest of the way to the bottom is that significant. [Thanks to Pascal Leroy for showing just how bad the current situation is.]
We can show similar problems with procedure calls. Consider:
P1 (Input => F2(O1), Output => O1);
If O1 is tagged (and thus passed by reference), this will set O1.C to 3 in either order of evaluation. (In both cases, Input will have (C => 1) and output will have (C => 2)).
But if O1 is not tagged (and thus passed by copy), this will set O1.C to 3 if evaluated left-to-right [F2 sets O1.C to 2, then passes (C => 1) to Input; Output is passed (C => 2); and then that summed to (C => 3)] and O1.C will be 2 if evaluated right-to-left [Output is passed (C => 1); F2 sets O1.C to 2, then passes (C => 1) to Input; and then that is summed to (C => 2)].
We can write similar code in Ada 95:
P1 (Input => F1(A1), Output => A1.all);
getting the same order dependence.
More generally, an expression can only have an order dependency if there is something that is passed by-copy (or otherwise creating a new object as part of the expression). Recall that all function results create a new object, even for tagged types. This fact suggests ways to limit rules (discussed later).
[Editor's note: Is this really true? I've convinced myself that it is, because in any such case, any updates are reflected immediately on the object - and the order of modifications within the subprogram body is determined by the code. Or are reorderings permitted? I would think not, presuming that the ultimate result is printed out or otherwise considered an "external interaction". Note that it is not a bounded error to modify an object by distinct access path if the object is a by-reference type (6.2(12)). Indeed, it might actually take two such newly created objects to create an order dependence, I haven't been able to come up with a case envolving only one newly created object -- I think that's because such a newly created object can't be passed as an in out parameter. Note that I haven't been able to find any use for this information, so maybe I should just dump it.]
Questions to answer:
Should the rules (whatever they are) apply to all calls or just functions with in out parameters? The latter position clearly is completely compatible, but it obviously leaves many cases of order dependence undetected.
Another issue is whether the rules should apply to all writable parameters (that is all in out and out parameters, and all access-to-varaible parameters), or just to writable by-copy parameters (which have to be in out or out). Clearly, problems can happen in any of these cases. But Ada has lived with the by-reference cases for decades, and adding in out parameters to functions doesn't change anything here (for a by-reference parameter, an in out parameter is equivalent to an access parameter of the same type). As noted in the discussion above, the real problems occur when new objects are created by intermediate expressions. "By-copy" in this context still means any parameter that might be passed by-copy: any type that is not known to be a by-reference type (clearly including untagged private types). It also has to include return objects of all types.
Note that these two issues are not completely independent of each other: limiting checks to by-copy parameters also limits the added incompatibility, (and the likelyhood that the rules are really preventing errors).
Survey of solutions
It fairly obvious that that order dependencies are a problem in Ada, and have been getting worse with each version of the language (even without in out parameters in functions). Moreover, it has been used as the primary reason for leaving something natural and useful (in out parameters for functions) out of the language.
We could do nothing, saying that since we're already nearly at the bottom of the pit, moving down two inches to the bottom will not have any practical effect. It surely would be easier than trying to cling to the side of the pit! But perhaps we can do better.
It should be noted immediately that (almost) anything we could do could not detect order dependencies that are caused by side-effects inside of functions. Rules enforced on a call (or syntax, or whatever) can only prevent problems caused by side-effects visible to the call.
We could limit in out parameters for functions to by-reference types (effectively to only tagged types). That clearly will not introduce any new problems, as shown by the examples that start out this section. But argubly it would make them more likely, and in any event, it would leave all of the existing nasty cases left unchecked.
One obvious solution would be to define the order of evaluation, eliminating the problem at the source. Java, for instance, requires left-to-right evaluation. However, that would encourage tricky code like the various examples by making it portable. Moreover, Ada compilers have been using this flexibility for decades; trying to remove it from compilers (particularly from optimizers) could be very difficult. Note that this is the only solution that actually could eliminate dependencies on side-effects inside of functions. But defining the order of evaluation was considered for both Ada 83 and Ada 95 and was deemed not worth it -- it's hard to see what has changed.
Another option would be to increase the visibility of parameters with side-effects. This sounds somewhat appealing (after all, it seems to be the basis on which access parameters are deemed OK and in out parameters are not). One possibility would be to add ordering symbols to named notation: Param <- <expr> for an in parameter (this includes access); Param -> <expr> for an out parameter; and Param <-> <expr> for an in out parameter.
However, for compatibility, old code that don't use the symbols would have to be allowed. That's especially bad because the symbol for ordinary named parameters (=>) looks like the symbol for an out parameter; while it usually will be an in parameter. Moreover, this solution does nothing for positional parameters in calls.
One could imagine requiring the syntax for calls of functions with in out parameters and making it optional elsewhere. That might placate in out parameter opponents, but otherwise doesn't seem to do much for the language.
Finally, we come to some sort of legality rules and/or runtime checks for preventing such order dependencies. It is important to note that making such rules too simple (and strong) only would mean that temporaries have to be introduced in some expressions. That would be an annoyance, but surely not as bad as the current ticking time-bomb.
The easiest option is to blame all of the problems on functions with in out parameters and make them stand alone. The rule would be something like:
A call of a function with an in out parameter must be the only call in an expression.
That would mean that
if F2(O1) then
would be legal (assuming F2 returned type Boolean), but
if F2(O1) = (C => 1) then
would not be. Obviously, this is too strict. Amazingly, it also not strict enough:
Some_Array(F3(O1)) := F2(O1);
would be allowed. (An assignment statement is not an expression!)
A call of a function with an in out parameter must be the only call in an expression or statement; If a call of a function with an in out parameter is the source expression of an assignment_statement, the target variable_name shall not include a call, an indexed component, a slice, or a dereference.
This would allow assigning to temporaries and record components (which can't be computed) and not much else.
Of course, this is completely compatible with Ada 95 and later; but it doesn't do anything to detect the existing cases of problems. Maybe that's not important.
In order to have an order dependence, there has to be two (or more) uses of an single object within an expression (or statement). But of course that object could be aliased via parameter passing or access values, a part of a larger object, or computed (as in an array component). In addition, one of the uses of the object has to be modified (that is, it is passed as an in out or out parameter, or it is passed as the designated object of an access type parameter). Finally, in the smallest call (subexpression) that contains both uses of the object (that is the call where the order dependence can occur), the use that modifies the object cannot be directly a parameter of that call.
This last part is shown by a procedure call like (we're using the declarations at the head of this section again):
P1 (Input => (C => F3(O1)), Output => O1);
Since the use that modifies O1 is in the top-level procedure call, it won't be modified until that call is underway. The other parameter(s) will have been evaluated by then.
Turning this into rules is not hard, except for defining a "single object". The easiest way to do that is to simply use type information:
Two objects are considered to be "potentially the same object" if one has the type of a part of the other, or one object's type is a partial view. [The last part is to avoid privacy issues - we don't want to be looking inside of private types to figure this out, and we want to asssume the worst. Remember that 'part' includes the type itself.]
A call is legal only if:
For each name N that is passed to some inner call (not including the call itself) as the actual parameter to a formal in out or out parameter, or is passed as the designated object of a formal parameter of an access type, there is no other name in the expressions of the actual parameters of the call other than the one containing N that denotes potentially the same object.
For the purpose of checking this rule, an assignment_statement is considered a call with two parameters.
[Randy: I ran out of time and energy here, just as I was getting to the good part.]
TBD: This is obviously too strict. Show some examples why. Also, reason out the rules for private types (they're probably not correct; they don't seem to handle components of a private type; also, try to figure out if the contents actually matter -- they may not as we're only interested in what is visible at the point of the call, right?). Try to relax the rules by using more information about when objects could be "potentially the same object", such as local variables that aren't aliased, pool-specific access types, etc.
Consider Runtime checks if we don't want to make have to assume two array components are the same?? End TBD.
!example
(See discussion.)
!ACATS test
!appendix


****************************************************************

Questions? Ask the ACAA Technical Agent