!standard 3.10(14/3) 17-10-09 AI12-0240-1/01 !class Amendment 17-10-09 !status work item 17-10-09 !status received 17-10-05 !priority Very_Low !difficulty Hard !subject Access value ownership and parameter aliasing !summary ** TBD. !problem When attempting to prove properties about a program, particularly programs with multiple threads of control, the enemy is often the unknown "aliasing" of names introduced by access types and certain uses of (potentially) by-reference parameters. By unknown aliasing of names, we mean the situation where two distinct names might refer to the same object, without the compiler being aware of that. A rename introduces an alias, but not an "unknown alias," because the compiler is fully aware of such an alias, and hence is not something we are worrying about here. On the other hand, if a global variable is passed by reference as a parameter to a subprogram that also has visiility on the same global, the by-reference parameter and the global are now aliases within the subprogram, and the compiler generating code for the subprogram has no way of knowing that, hence they are "unknown" aliases. One approach is to always assume the worst, but that makes analysis much harder, and in some cases infeasible. Access types also introduce unknown aliasing, and in most cases, an analysis tool will not be sure whether the aliases exist, and will again have to make worst-case assumptions, which again may make any interesting proof infeasible. !proposal This AI introduces a restriction against parameter aliasing (No_Parameter_Aliasing), and an aspect called "Ownership" for access types, which together can reduce unknown aliasing to the point that the only remaining unknown aliases involve references to elements of the same array using dynamic indices. This sort of aliasing is felt to be manageable by most analysis tools, as it is restricted to determining the possibility of numeric equality between such indices. Access type Ownership: If an access-to-variable type has the value True for the aspect Ownership, then certain operations on the access type are restricted, and certain operations result in an automatic deallocation of the designated object. The basic rule is that at most one object of such an access-to-variable type may be used to refer to the designated object at any time. There might be multiple access-to-variable (or access-to-constant) objects that designate the same object at any given time, but at most one of them may be dereferenced. All of the others are considered "invalid." Conversely, there might be multiple "valid" access-to-constant objects that designate a given object, so long as all access-to-variable objects designating the object are at that point invalid (or "read-only-valid" -- see below). This is the usual "single writer, multiple reader" rule for safe access to a shared variable. An access-to-variable object can also become "read-only-valid," which means that it can only be used for read-only access to the designated object. Finally, an access-to-variable object can become "frozen invalid" which means it is invalid, but further it cannot be re-assigned a new value. An access-to-variable object that is valid or read-only-valid is considered to be the "owner" of its designated object. There should be exactly one owner of every existing object created by an allocator for an access type with a True Ownership aspect. While read-only-valid, the access object is itself considered read-only. Certain operations change which objects are valid, read-only-valid, invalid, or frozen invalid, to preserve the above overall objective. In addition, automatic deallocation occurs at certain points: * Default initialization of an access-to-variable object (including a component) of a type with a True Ownership aspect: # Static semantics: The object starts out as valid but null. The rules governing the initial assignment if any are covered below, as are assignments from a later assignment statement. * Assignment to an access-to-variable object: # Legality rule: The RHS shall be valid, an allocator, or the literal null. If not a null literal or an allocator, the RHS shall be local to the nearest enclosing body (or be a parameter thereof). The LHS shall not be read-only-valid nor frozen invalid. # Static semantics: The LHS becomes valid, and the RHS becomes invalid (if not the literal "null" or an allocator). [Ownership is transfered to the LHS.] # Dynamic semantics: If the LHS is valid and non-null prior to the assignment, the object it designates is finalized and deallocated. * Assignment to an access-to-constant object: # Legality rule: The RHS shall be valid or read-only-valid (we ignore here cases where the RHS is null or an allocator). The RHS shall be local to the nearest enclosing body (or be a parameter thereof). The accessibility level of the type of the LHS shall be deeper than that of the nearest enclosing body. # Static semantics: The LHS becomes valid, and the RHS becomes read-only valid. [Redundant: Ownership of the designated object is not transferred, but the access object itself and the designated object are both now effectively read-only.] The RHS remains read-only-valid until the master of the type of the LHS is finalized. * Assignment to a composite variable with some access object subcomponents # Equivalence: Legality checks and semantics are those of an equivalent set of individual subcomponent assignments, presuming the array subcomponents all use static indices (see Arrays section below). * Assignment to a composite constant with some access object subcomponents (as part of initialization, presumably) # Legality rule: All access object subcomponents of the RHS shall be valid or read-only-valid. # Static semantics: If the RHS is an aggregate or build-in-place function call, the LHS and the RHS are considered a single object for the purposes of these rules. and the access subcomponents are considered valid. If the RHS is an existing object, the access subcomponents of the LHS are read-only-valid, and the access subcomponents of the RHS become read-only-valid, and remain so until the finalization of the LHS, at which point they are restored to their prior state. * Dereference # Legality rule: The access object shall be valid or read-only-valid # Static semantics: A dereference of a read-only-valid access object provides a constant view of the designated object (even though the access object is of an access-to-variable type). A dereference of a valid access-to-variable object provides a variable view and takes ownership of the object; in this case the access object is frozen invalid as long as the dereference exists. [NOTE: This allows the data structured rooted at this object to be walked with the pointer to the root restored to validity at the end of the lifetime of the dereference; renaming or passing as a parameter can be used to extend the lifetime of the dereference over the period of exploration of the data structure.] * Component selection # Static semantics: An access object that is a subcomponent of a constant view is considered read-only-valid. [Redundant: An access object that is a subcomponent of a variable view can be valid, read-only-valid, invalid, or frozen invalid according to what operations have been applied to it or the enclosing object.] * Parameter passing, IN parameter + If the formal parameter is of an access-to-variable type: # Legality rule: The actual shall be valid, the literal null, or an allocator. # Static semantics: At the point of call, the formal becomes valid and the actual becomes frozen invalid (if not the literal null or an allocator). At the end of the subprogram, the actual becomes valid again; if the actual is an allocator the object is reclaimed at this point. + If the formal parameter is of an access-to-constant type: # Legality rule: The actual shall be valid or read-only-valid (we ignore the literal "null" or allocators in this case). # Static semantics; At the point of call the actual becomes read-only-valid; the formal starts out (and remains throughout its existence) read-only-valid. At the end of the subprogram, the actual is restored to its prior state. + If the formal parameter is a composite object with access subcomponents: # Legality rule: All access subcomponents of the actual shall be valid or read-only-valid. # Static semantics: All access subcomponents of the actual become read-only-valid. The access subcomponents of the formal are read-only-valid throughout their existence. At the end of the subprogram, the access subcomponents of the actual are restored to their prior state. If the actual is an aggregate, all of the access subcomponents are finalized[Redundant:, reclaiming any objects they designate if non-null]. * Parameter passing, IN OUT parameter + If the formal parameter is of an access-to-variable type: # Legality check: The actual shall be valid at the point of call. At a return statement or the end of the subprogram, the formal shall be valid. # Static semantics: At the point of call, the formal becomes valid and the actual becomes frozen invalid. After the call, the actual becomes valid again. + If the formal parameter is of an access-to-constant type: illegal. + If the formal parameter is a composite object with access subcomponents: # Equivalence: This is equivalent to having a set of IN OUT formal parameters, one for each such access subcomponent, with the corresponding subcomponent of the actual being passed to the subcomponent of the formal. * Parameter passing, OUT parameter + If the formal parameter is of an access-to-variable type: # Legality check: At the point of call, the actual shall not be read-only-valid. At a return statement or the end of the subprogram, the formal shall be valid. # Static semantics: At the point of call, the actual becomes frozen invalid (which it might already be), and the formal starts out invalid. After the call, the actual becomes valid. # Dynamic semantics: Upon return from the subprogram, if the original value of the actual was valid, and the final value of the formal is not the same as that of the original value of the actual, the object designated by the actual is finalized and deallocated, prior to copying back from the formal to the actual. + If the formal parameter is of an access-to-constant type: illegal. + If the formal parameter is a composite object with access subcomponents: # Equivalence: This is equivalent to having a set of OUT formal parameters, one for each such access subcomponent, with the corresponding subcomponent of the actual being passed to the subcomponent of the formal. * Arrays # Static semantics: An array as a whole can be valid, read-only-valid, or invalid, or the array can be statically partitioned into three sets of components, one the valid components, one the read-only-valid components, and one the invalid components (the "as a whole" cases can be seen as special cases of this more general case, with two of the three partitions being empty). If other rules would indicate that a statically-indexed component is made valid, read-only-valid, or invalid, that one component is affected. If other rules would indicate that a dynamically-indexed component is made read-only valid, the entire array becomes read-only-valid. # Legality rules: If a legality rule requires a dynamically-indexed array component to be valid or read-only-valid, the array shall have no components that violate the requirement. If other rules would indicate that a dynamically-indexed component is to become valid or invalid, all components of the array shall already have that status. [Redundant: Note this implies that you cannot transfer ownership from a dynamically indexed subcomponent, as that action would necessarily require that it be valid before, and invalid afterward. You can make a read-only copy of one that is valid or read-only-valid, resulting in the whole array becoming read-only-valid. You can also assign into one that is already valid, as that preserves the state.] * Finalization of an object (including a component) being of an access type with a True Ownership aspect. # Legality check: An access-to-variable object that was valid when initialized shall not be read-only-valid when finalized. # Dynamic Semantics: When an access-to-variable object is finalized, if the object is valid, the object it designates (if any) is finalized and deallocated. Control flow: # Legality Rule: When two control paths converge within a subprogram, such as the "then" and "else" part of an "if" statement, or the statements before a loop and the end of the body of the loop, the status of an access object shall not be valid or read-only valid on one path, while being invalid or frozen invalid on the other. # Static Semantics: If the status is the same on both paths, the status remains that at the join point. If the status on either path is read-only-valid, the status is read-only-valid after the join point. Similarly, if the status on either path is frozen invalid, the status is frozen invalid after the join point. TBD: Conversion rules? 'Access? No_Parameter_Aliasing Restriction: The No_Parameter_Aliasing restriction disallows passing overlapping parts of a single variable as a composite or aliased parameter, more than once within the same expression or simple statement, unless all are as IN parameters. Furthermore, a part of a variable that is global to the called subprogram or task entry may not be passed as a parameter if the formal is composite or aliased, unless the Global aspect that applies to that subprogram implies that it makes no reference to that part of the global variable, or it makes only read references to that part of the global variable, and the part is passed as an IN parameter. Variables of a private type are presumed composite, for the purpose of this rule. If two parts of the same array are passed as composite or aliased parameters in the same expression or simple statement, and at least one is an OUT or IN OUT parameter, a check is made that the parts do not overlap. If this check fails, Program_Error is raised. A corresponding check is performed if the Global aspect of the called subprogram identifies a part of an array, to ensure that an overlapping part of that array is not passed as a composite or aliased parameter, if either the formal or the global is writable in the called subprogram. If parts of two dereferences of the same type are passed as composite or aliased parameters, and at least one of them is an OUT or IN OUT parameter, and the parts would overlap if the dereferenced access values were the same, a check is made that the access values are not the same. If this check fails, Program_Error is raised. [NOTE: The above is a bit naive, and needs more words to deal with cases like slice of dereference of access-to-array, etc. But hopefully you get the idea! Also, if all access types have Ownership, the last check should never be necessary. Note further than this does not worry about aliasing which occurs internal to the subprogram due to dereferencing an access value, as that is considered something that can only be dealt with effectively using the Ownership aspect.] [Possible Alternative: Drop the "composite or aliased" part in all the above paragraphs. It seems like a minor capability to be allowed to pass the same elementary object as both an IN and an [IN] OUT parameter, and this change would obviate the need to do the additional subtle Ada 2012 checks for cases where the same elementary object appears more than once as an [IN] OUT parameter. Perhaps parenthesizing of an IN parameter that is the name of a variable could be used to mean the compiler may insert a copy at that point to break aliasing?] !discussion The objective of this pair of features, the Ownership aspect and the No_Parameter_Aliasing restriction, is to reduce aliasing to the point where programs using pointers and pass-by-reference can nevertheless be fully analyzed, in particular for compile-time detection of data races. The Ownership aspect is a relatively simple idea, where an object is owned by exactly one pointer at any time. When an object is owned, it can be deallocated when the owner is assigned to be null or point at a different object. We want to allow read-only copies to be used to walk a data structure, so we allow access-to-constant copies so long as they are of a type that goes away before the original object is finalized. Most commonly these will probably be stand-alone objects of an anonymous access-to-constant type. Similarly, a long-lived dereference of the root of a data structure may be used to walk a data structure destructively, while restoring the validity of the access object designating the root after the dereference goes away. To change what is the root object itself requires a direct assignment to the owner of the root, which cannot occur while the dereference exists. As noted we have four states for an access object: valid, read-only valid, invalid, and frozen invalid. A valid access object is the owner, and provides read-write access to the designated object. Assigning a new value to a valid access object causes the prior designated object to be deallocated. A read-only-valid access object is a shared owner, and provides read-only access to the designated object. A read-only-valid access object cannot itself be set to point elsewhere. An invalid access object is not useful until it is assigned a valid value, or until its validity is restored because all copies have been finalized. A frozen invalid access object cannot be copied and cannot be assigned a new value, and is used to represent a state of an access object that is necessarily transitory due to the existence of another (temporary) owner, such as a dereference or a formal parameter. Tracking validity of stand-alone access objects is pretty straightforward. Tracking validity of components is more challenging, and perhaps additional rules over and above those given above will be needed. In particular, we have proposed rules for dynamically-indexed array components. They will need to be studied and used in examples to know whether they are adequate and usable. For the No_Parameter_Aliasing rules, we fall back on run-time checks to prevent aliasing when necessary for "dynamic" names. Hopefully such checks can be eliminated at compile-time in almost all interesting cases. SPARK would complain if such a check could not be proved at analysis time. The work on this topic was started in earnest by an intern at AdaCore, Georges-Axel Jaloyan, mentored by Yannick Moy. Many of these concepts are inspired by the "borrow checker" of the Rust language, and/or the anti-aliasing rules of the ParaSail language and the SPARK language. !ASIS [Not sure. It might be necessary to have a query for the new aspects. - Editor.] !ACATS test ACATS B-Tests and C-Tests are needed to check that the new capabilities are supported. !appendix From: Tucker Taft Sent: Thursday, October 5, 2017 12:05 AM Here is a new AI [Version /01 of the AI - Editor] based on generalizing to Ada some work we have been doing on adding "safe" pointers to SPARK. I have coupled this with a no-parameter-aliasing restriction because you really need both to get much benefit from either. We have several customers, both SPARK users and "regular" Ada users, who have quite an interest in this topic as well. We will be prototyping this in any case, but in my view (and others) it could be a really great feature for Ada 202X more generally. **************************************************************** From: Randy Brukardt Sent: Thursday, October 5, 2017 8:08 PM This seems insufficiently baked for the ARG to work on it. (If we had unlimited time, it would be different, but this looks like something that could take all of the time available.) In particular, the comment in the !discussion: Tracking validity of components is more challenging, and perhaps additional rules over and above those given above will be needed. In particular, we have proposed rules for dynamically-indexed array components. They will need to be studied and used in examples to know whether they are adequate and usable. I don't think the ARG is the place to develop these sorts of rules. In addition to the proposed rules, there need to be rules for dealing with this in generics. (The word "generic" does not appear in this proposed AI, so it's pretty clear that there is an enormous amount missing). For instance, how do the parameter aliasing restrictions get enforced on generic formal types? Assume-the-worst in generic bodies would be wildly incompatible, and not enforcing them would seem to destroy the entire idea. Not allowing generics at all doesn't seem viable, either, nor is abandoning the contract model (I suspect SPARK does the last, but that won't fly in Ada). Similarly, how are the restrictions enforced when a type with a component of a tracked access type is passed as a generic actual type? Assume-the-worse would ban most existing assignments -- no way. Finally, I'm terrified at the notion of automatic deallocation of objects. In the case of a general access type (or generic formal type), the compiler cannot know at compile-time the proper storage pool to use to deallocate the original object. (The object may not have been allocated at all [it could have been an aliased global object], or could have been implicitly converted from some other type.) Besides, most anonymous access types have no storage pool associated, so how could it be deallocated automatically? I think automatic deallocation has to be limited to pool-specific types (where the correct pool is well-defined); other places where it would be needed ought to be illegal. [Yes, any use of Unchecked_Deallocation on a general access type is dangerous, but such uses are directly visible in the code, involve the name "Unchecked", and as such should be carefully checked in code reviews. Stuff that happens automatically is much, much harder to review.] These sorts of (big) issues tell me that this idea needs more work before the ARG starts on it. There clearly need to be more examples created, and possibly a prototype implementation. The basic idea seems sound, but it is hard to say if it is usable in Ada at this point, and it seems to be the sort of experimental idea that isn't ready for Standardization. **************************************************************** From: Jeff Cousins Sent: Friday, October 6, 2017 5:06 AM Interesting, and clearly some work has gone into this. But... It says “The Ownership aspect is a relatively simple idea” but it looks quite complicated to me, most of it is more like what I would expect a static analysis tool to do than a compiler. E.g. it has a section on control flow - if we were to require flow analysis of a compiler, then there are more useful things that it could look for, such as checking that all paths through a subprogram assign values to all the out parameters, and that all paths through a function return a value. It seems to have an Ada 83 view of access types, i.e. that they only point to objects allocated on the heap. Randy’s reply elaborates on this. Also, access types are currently classified as being elementary types, I think that would have to be revised if they have to carry around hidden state information with them. **************************************************************** From: Tucker Taft Sent: Friday, October 6, 2017 10:31 AM > Interesting, and clearly some work has gone into this. But... > > It says “The Ownership aspect is a relatively simple idea” but it looks > quite complicated to me, most of it is more like what I would expect a > static analysis tool to do than a compiler. What I meant was that the *idea* is simple, but the compiler support is indeed not simple. > E.g. it has a section on control flow - if we were to require flow analysis > of a compiler, then there are more useful things that it could look for, > such as checking that all paths through a subprogram assign values to all > the out parameters, and that all paths through a function return a value. You are correct that this would introduce some control-flow based checks, and there are many checks that could take advantage of that. In fact, almost every Ada compiler already does some amount of control flow, merely to provide helpful warnings. This would be the first time we would make legality depend on that, which is worth considering! > It seems to have an Ada 83 view of access types, i.e. that they only point > to objects allocated on the heap. Randy’s reply elaborates on this. This is a good point. The TBD "conversion" rules were meant to cover that, in particular a value of such an "owning" access type would only be allowed to point to an actual heap object, of the correct storage pool. 'Access or 'Unchecked_Access would be illegal for such types, as would be conversions from types that are not "owning" access types or that use a different Storage_Pool. > Also, access types are currently classified as being elementary types, I > think that would have to be revised if they have to carry around hidden > state information with them. The rules are intended to mean that the hidden state is only needed at compile time. But even now, there are access types that are multiple words, e.g. access-to-protected-subprogram, and in some implementations, access-to-unconstrained-array. So being multiple words doesn't necessarily imply being "composite" from a language point of view. **************************************************************** From: Randy Brukardt Sent: Monday, October 9, 2017 8:21 PM > Here is a new AI ... I noted that the !problem seems to concentrate totally on the problems of static analysis tools. It starts: "When attempting to prove properties about a program, ..." I'd argue that a large proportion of Ada users aren't planning to do that anytime soon. And arguably, static analysis tools can solve their own problems (as you say is being done with SPARK). So what's the value of this proposal to the average Ada user? Static analysis features like Preconditions and Predicates are evaluated at runtime in Ada, so they are valuable to Ada programmers even if they never even have imagined a static analysis tool. I think that is an important part of any major feature added to Ada. --- > You are correct that this would introduce some control-flow based > checks, and there are many checks that could take advantage of that. > In fact, almost every Ada compiler already does some amount of control > flow, merely to provide helpful warnings. This would be the first > time we would make legality depend on that, which is worth considering! So far as I know, only GNAT uses front-end flow analysis for "helpful warnings". I've never seen such warnings in other Ada compilers (most compiler warnings aren't very helpful!). Janus/Ada is probably extreme in this regard, but it could not enforce any rules on anything larger than an expression. Flow analysis is done only in the optimizer, only if the optimizer is on, and can only produce warnings - it has to produce an executable program by that point; the data structures needed to produce detailed errors are long gone by that point. I would have to discard 40% of the compiler and rewrite it to have any way to do any serious flow analysis in the front-end. (Hint: I'm not going to do that.) Additionally, if we require flow analysis for Legality Rules, we would have to define precisely what flow analysis is required and not required for that purpose. Besides being a lot of work, it certainly would be different than what any actual compiler did for such things, meaning that every vendor would have to bifurcate their flow work (much as every vendor has to split compile-time expression evaluation into static and non-static parts). That's going to be a lot of work even for compiler's that already do the needed analysis. It also seems that this sort of requirement would also require abandoning all pretense of a contract model for generics. (You all know where I stand on that.) It wouldn't make much sense to require assume-the-worst in a generic body, as that would essentially make all code illegal; nor does it make much sense to use a bunch of runtime checks for this (or any such model). Finally, if we were to start requiring flow analysis, there are lots of things that would be better uses for it than this proposal: uninitialized stand-alone object prevention seems to lead the list (as Jeff noted). It might be that this idea is best left as a SPARK feature, since it doesn't seem to fit into the existing model of what an Ada compiler is required to do. ****************************************************************