!standard 7.5(8.1/2) 07-10-22 AI05-0067-1/01 !class binding interpretation 07-10-22 !status work item 07-10-22 !status received 07-09-03 !priority High !difficulty Hard !qualifier Error !subject More build-in-place issues !summary ** TBD ** !question (1) Consider the following example: with Ada.Tags; procedure Test is type T0 is tagged limited null record; type Enclosed (Enclosing : access T0'Class) is limited null record; type T1 (N : Natural) is new T0 with record Self_Ref : Enclosed (T1'Access); Discrim_Dependent : String (1 .. N); end record; type T2 is new T1 with record Ptr : Some_Access_Type; Lim : Some_Limited_Type; end record; F_Result_Tag : Ada.Tags.Tag; function F return T1 is begin return Result : T1 (Ident_Int (1234)) do F_Result_Tag := Result.Self_Ref.Enclosing.all'Tag; -- What value is assigned here? Some_Procedure (Result); end return; end F; type T0_Ref is access T0'Class; for T0_Ref'Storage_Pool use ... ; Allocator_Value : constant T0_Ref := new T2'(F with Ptr => Some_Function, Lim => Some_Other_Function); Allocated : T2 renames T2 (Allocator_Value.all); begin ...; end Test; Is F_Result_Tag assigned the value T1'Tag or T2'Tag? The first sentence of 6.5(8/2) seems to require T1'Tag. The first sentence of 3.9(3) seems to require T2'Tag. Assigning T1'Tag would introduce a violation of the fundamental rule that a tagged object's tag is immutable over the lifetime of the object. If the call to F is to be viewed as initializing a portion of an object of type T2, then it seems that T2'Tag should be assigned. Similarly, if the extended return statement makes a dispatching call, is the routine for T1 or T2 called? (Note that if it is T2, it is possible that the extension components are not yet initialized.) (2) Consider: type T is limited record F1 : Integer; F2 : access T := T'Unchecked_Access; end record; R : aliased T := (F1 => 5, F2 => <>); There does not appear to be anywhere in the Standard that says that R.F2 will be R'Access after R is created. The semantics of the aggregate are that an anonymous object is created and the components are assigned (4.3(5)); based on 4.3.1(19.1) and 8.6(17), the initial value of F2 should then be an access to an anonymous object. 7.5(8.1) says that for an aggregate of a limited type, the implementation shall not create a separate anonymous object, but that the aggregate shall be constructed directly in the target object. While this tells something about how the operation will be implemented (and implies that there is no extra assignment that would cause an Adjust/Finalize operation to be performed), it doesn't say anything to the effect that the anonymous object is "identified" with the target object, or the target object is treated as the anonymous object. The canonical semantics require that F2 be initialized to point to some anonymous object, and while 7.5(8.1) says we're not supposed to create an anonymous object, it doesn't say what will replace the anonymous object where that matters. What is the intent? !recommendation (See Summary.) !wording ** TBD ** !discussion For (1), the model is that (logically) there are two objects at the same location, and each has the appropriate tag. (Remember that the semantics are determined by the canonical semantics: build-in-place is just a permission.) Thus T1'Tag is assigned. However, dispatching in an extended return statement is dubious. (So is dispatching in Initialize, for reasons similar to those given in the question.) One could imagine trying to detect such dispatching as an error rather than worrying about the answer to this question. For (2), of course R.F2 points to R. But surely that should be said somehow. ** TBD ** --!corrigendum 7.6.1(17.1/1) !ACATS Test ** TBD ** !appendix From: Stephen W. Baird Sent: Tuesday, August 7, 2007 6:10 PM When an aggregate is used to initialize an object, initialization is performed "in place" in some cases. In other words, instead of creating a separate anonymous object for the aggregate and then assigning it to the object being initialized, the aggregate is built in place. This is required in some cases (7.5(8.1/2), 7.6(17.1/2)) and permitted in others (7.6(21/2)). When a function call is used as the ancestor expression of a build-in-place extension aggregate, the RM seems to impose contradictory requirements on the initialization of the tag component of the object being initialized. Consider the following example: with Ada.Tags; procedure Test is type T0 is tagged limited null record; type Enclosed (Enclosing : access T0'Class) is limited null record; type T1 (N : Natural) is new T0 with record Self_Ref : Enclosed (T1'Access); Discrim_Dependent : String (1 .. N); end record; type T2 is new T1 with record Ptr : Some_Access_Type; Lim : Some_Limited_Type; end record; F_Result_Tag : Ada.Tags.Tag; function F return T1 is begin return Result : T1 (Ident_Int (1234)) do F_Result_Tag := Result.Self_Ref.Enclosing.all'Tag; -- What value is assigned here? Some_Procedure (Result); end return; end F; type T0_Ref is access T0'Class; for T0_Ref'Storage_Pool use ... ; Allocator_Value : constant T0_Ref := new T2'(F with Ptr => Some_Function, Lim => Some_Other_Function); Allocated : T2 renames T2 (Allocator_Value.all); begin ...; end Test; Is F_Result_Tag assigned the value T1'Tag or T2'Tag? The first sentence of 6.5(8/2) seems to require T1'Tag. The first sentence of 3.9(3) seems to require T2'Tag. Assigning T1'Tag would introduce a violation of the fundamental rule that a tagged object's tag is immutable over the lifetime of the object. If the call to F is to be viewed as initializing a portion of an object of type T2, then it seems that T2'Tag should be assigned. If F.Result is used as the controlling operand of a dispatching call, one might equivalently ask which subprogram body is executed - T1's or T2's? Suppose that F_Result_Tag is assigned T2'Tag and this dispatching call executes the subprogram body associated with T2. This leads to another question about the state of the T2-specific portion of F.Result at the time of the dispatching call. Might this portion of F.Result be uninitialized at this point, resulting in an access-before-initialization problem? Certainly AI95-00373 (the "requires late initialization" AI) is only an incomplete solution to the problem of references to an object whose initialization is in progress. This part of the language definition is not completely bulletproof. Still, a user might reasonably conclude that it is safe, for example, to pass F.Result off to a procedure which then makes a dispatching function call using the operand it has been passed. The memory fault that results when the called function attempts to dereference a T2-specific access component which happens to contain the value "37" will probably not amuse the user. Within the statement list of an extended return statement, it seems quite reasonable to assume that the return object's initialization is complete. One solution would be to ignore the problem and hope that users won't do this sort of thing very often. Another solution would be to introduce a requirement that the initialization of an extended return object must be complete before beginning execution of the statement list of an extended return statement. This might sound like an obvious consequence of existing language rules, but the substance of this requirement stems from the inclusion of extension components which are not components of the function result type. Is this implementable? Consider an implementation which passes a storage-allocating callback routine into F in order to allocate storage for the function result. In the case of this particular example, that function might be called with the discriminant value of the result. It would then apply T2's (not T1's) discriminant-to-size mapping function and allocate the appropriate amount of storage out of the right storage_pool. It could also, before returning to F, initialize the tag component to T2'Tag and then initialize the T2-specific portions of the allocated object by calling Some_Function and Some_Other_Function. This would meet the proposed requirement and solve the aforementioned access-before-initialization problem, but it would introduce a somewhat peculiar possibility. Suppose that F repeatedly enters the extended return statement and exits it (e.g., via a goto statement) before finally returning a value. This would result in multiple calls to the storage-allocating callback routine. This would in turn mean that the initialization of the T2-specific portion of F.Result would be executed repeatedly. The RM does not currently permit this sort of oddness. Note that this only arises in the presumably rare case of exiting an extended return statement without returning. Showing how this requirement might be satisfied for one particular example does not demonstrate that it can always be satisfied, but the approach of initializing the extension components before, rather than after, the ancestor portion of the extension aggregate seems to generalize well. Returning to the original question about F_Result_Tag, the other alternative is that F_Result_Tag is assigned T1'Tag and a dispatching call executes the subprogram body associated with T1. After F has returned, the tag of the allocated object is changed from T1'Tag to T2'Tag. This form of mutable tags doesn't seem to introduce any *definitional* problems. There would be a problem if the language included some form of a general access type which could only designate objects with one particular tag, but it doesn't. The major problem with this approach seems to be a safety issue associated with usage patterns. Suppose, for example, that a customer has a pair of dispatching operations, Acquire and Release, that are supposed to match up in some way. A dispatching call to Acquire is executed while an object has one tag value; later, after the object's tag value has changed, a dispatching call to Release is executed. This is likely to lead to inconsistencies. This issue alone seems like enough to rule out the "mutable tags" model. Mutable tags would also introduce some peculiar corner cases, most of them involving the Unchecked_Access attribute. If F passes Result'Unchecked_Access off to a task which then renames The_Pointer_I_Was_Passed.Some_Dispatching_Operation, it will be odd when two calls to that rename end up executing different subprogram bodies. An implementation which resolved the call address at the point of the rename declaration would presumably be in error. In the case of optional build-in-place aggregates, the mutable tag model does introduce fewer portability problems. Dispatching calls will execute the same subprogram bodies if an aggregate is built in a a separate object vs. being built in place with a mutable tag. If it is built in place with an immutable tag, then different subprogram bodies may be executed. This is probably not a major concern as long as no access-before-initialization problems are introduced. Pascal Leroy has pointed out (in private correspondence) that there is an analogous issue in Ada95 because of RM95 7.6(17.1/1). An Initialize procedure plays the role of the Ada05 build-in-place function. Consider the following Ada95 example: with Ada.Finalization; package Pkg is type T1 is new Ada.Finalization.Controlled with null record; procedure Initialize (X1 : in out T1); type T2 is new T1 with ...; end Pkg; with Ada.Tags; with Text_Io; package body Pkg is procedure Initialize (X1 : in out T1) is begin Text_Io.Put_Line (Ada.Tags.Expanded_Name (T1'Class (X1)'Tag)); end Initialize; end Pkg; procedure Pkg.Maine is X : T2 := (T1 with null record); begin null; end Pkg.Maine; It seems clear that a call to Pkg.Maine should output the line "PKG.T2". It is true that RM95 4.3.2(7) talks about initialization "as for an object of the ancestor type", but this initialization does not include the tag component itself. Ada95 also has an analogous access-before-initialization issue - the Initialize procedure in the above example might make a dispatching call and end up accessing uninitialized T2-specific components. It seems that while this was possible in Ada95, it did not come up in practice. With the introduction of extended return statements and build-in-place function calls, this may be a more serious problem in Ada05. **************************************************************** From: Randy Brukardt Sent: Tuesday, August 7, 2007 6:55 PM Replying to Steve Baird (whose message is in some funny format that I can't seem to quote): As you point out in toward the end of this message, this is nothing new. The issue seems to revolve around (re)dispatch and direct access to tags. Re-dispatch is generally considered bad design. Indeed, Ada recognizes this by making it hard to write: you have to insert an explicit conversion to a class-wide type in order to force a redispatch. It seems even more dangerous in an initialization context, because many of the components may not yet be initialized. Direct uses of tag values are rare, and doing things based on them is also considered bad practice. Mostly likely, this issue has never come up because it is hard to write a redispatch by mistake, and the "bad practice" thoughts noted above keep people from doing it on purpose. All that seems to be new here is to note that an extended_return_statement is also an initialization context where redispatch is a very bad idea. The only real question in my mind is whether we ought to make a language change to discourage redispatch even further. Ideally, it should be banned outright in initialization contexts. But I don't think that can be done without compatibility issues (especially inside of Initialize). Steve's "solution" to the problem of putting even more requirements on initialization order doesn't really solve the problem of redispatch, and it seems to make finalization more risky. (If the components are finalized in reverse order, a redispatch in a Finalize routine has the same problem of potentially accessing components that have already been finalized; if the "arbitrary order" of the RM is used, you again cannot reason about what a redispatch will do in terms of accessing components. I realize that the Finalize case probably wouldn't make a program erroneous, but if you can't reason about the behavior of Finalize, the program might as well be erroneous because you can't write anything that will be guaranteed to work.) In any case, I don't think this will be a problem in extended_return_statements any more than in Initialize routines. It's unlikely that anyone will be writing redispatching calls in these contexts unless they're seriously confused (if you need to do something to the object as a whole, you have to do it via the whole-object Initialize/Adjust). This all leads me to think that the problem is with 6.5(8/2). This rule does not make sense for build-in-place, because we don't know the tag of the ultimate object. So I think we need to drop it or modify it such that it does not apply inside the function or to build-in-place objects. (Not sure which.) Steve didn't tell us why this issue came up; I doubt it was a user problem. It seems most likely to be an implementation problem which he is then trying to make into a mountain. **************************************************************** From: Tucker Taft Sent: Tuesday, August 7, 2007 8:22 PM My view is that you should not take "build in place" too literally from a semantic point of view. It is more of an implementation requirement, where the the semantics should generally work out as they would without build-in-place. In particular, the "outer" object of an extension aggregate doesn't really exist until the aggregate is fully evaluated. While evaluating the parent part, the parent part is a separate object, that just happens to overlap with the ultimate resting place of the outer object. So in my view, until you return from the function, the return object it is building is definitely of the result type of the function, and so the tag will be T1 until that moment. I also don't think 3.9(3) is really violated by this model, because the outer object doesn't really exist until all of the components of the aggregate have been evaluated. You certainly don't want anyone manipulating an object corresponding to the result of an aggregate until these evaluations have occurred. **************************************************************** From: Randy Brukardt Sent: Wednesday, August 8, 2007 12:19 PM > So in my view, until you return from the function, > the return object it is building is definitely of the > result type of the function, and so the tag will > be T1 until that moment. That's a more consistent view than whatever I was babbling about last night. I like it better, so I'm going to switch my answer to agree with Tucker, as it means that "optional" build-in place won't change the semantics of a return statement (and that would be very bad). OTOH, I don't think we're going to be seeing any tests of redispatching in extended_return_statements, so I doubt much that whatever implementations do will matter much in practice. **************************************************************** From: Stephen W. Baird Sent: Tuesday, August 14, 2007 7:58 PM > Steve didn't tell us why this issue came up; I doubt it was a user problem. It was an implementation question. I'm just trying to figure out how build-in-place extension aggregates are supposed to work. > .. the "outer" > object of an extension aggregate doesn't really exist > until the aggregate is fully evaluated. While > evaluating the parent part, the parent part is > a separate object, that just happens to overlap > with the ultimate resting place of the outer > object. > > So in my view, until you return from the function, > the return object it is building is definitely of the > result type of the function, and so the tag will > be T1 until that moment. > Given that Randy and I were unable to deduce this two-object model from the RM, it seems that some clarification is needed. Viewing an extension aggregate as two distinct objects residing at the same location certainly seems more appealing than viewing it as a single object whose tag value changes midway through its lifetime. This definition does, however, seem to involve some handwaving. In Ada terms, "residing at the same location" means at least the following: a) The metamorphosis of the first object into (a portion of) the second is accomplished without any Adjust/Finalize calls or any other user-visible side effects. b) An access value which refers to (a part of) the first object implicitly becomes a reference to the (corresponding part of) the second object at the end of the lifetime of the first object (i.e., at the time of the assignment of the first object to the second). This is an exception to the usual rule that access values which refer to an object after the end of the object's lifetime are dangling references. Similarly, references derived from such access values (e.g., a rename of Outstanding_Access_Value.all.Some_Component) are similarly transformed. c) Coextension ownership is handed off from the first object to the second. Should all of this be spelled out in an AARM note? The two-object approach also opens the door for a violation of the last sentence of 3.9.3(1/2): Because objects of an abstract type cannot be created, a dispatching call to an abstract subprogram always dispatches to some overriding body. If the ancestor type of an extension aggregate is abstract, then a dispatching call may end up attempting to execute an abstract subprogram: procedure Abstract_Call is package Pkg1 is type T1 is abstract tagged limited null record; function Abstract_Function (X1 : T1) return Integer is abstract; end Pkg1; function Dispatcher (Ref : access Pkg1.T1'Class) return Integer is begin return Pkg1.Abstract_Function (Ref.all); end Dispatcher; package Pkg2 is type T2 is abstract new Pkg1.T1 with record F1 : Integer := Dispatcher (T2'Access); end record; end Pkg2; package Pkg3 is type T3 is new Pkg2.T2 with record F2, F3 : Integer; end record; function Abstract_Function (X3 : T3) return Integer; end Pkg3; package body Pkg3 is function Abstract_Function (X3 : T3) return Integer is begin return 123; end Abstract_Function; end Pkg3; X : Pkg3.T3 := (Pkg2.T2 with 456, 789); begin null; end Abstract_Call; How is this supposed to work? When this example is compiled with our compiler, the tag field of the aggregate is never set to anything other than T3'Tag and Dispatcher's call dispatches to Pkg3.Abstract_Function. Is our implementation in error? **************************************************************** From: Tucker Taft Sent: Tuesday, August 7, 2007 x:xx PM ... As far as your question about an ancestor part that is specified by an abstract ancestor subtype_mark, the dynamic semantics are actually pretty different. There is no separate object created in that case. It simply says that the components that are not given by the association list are initialized by default. By contrast, when the ancestor part is specified by an expression, the expression is evaluated, which creates a separate object (with its own tag) that is then assigned to the ancestor part of the aggregate object. So I don't see the need to ever have a tag that identifies an abstract type, since there is no separate object when you specify an ancestor subtype_mark rather than an ancestor expression. I believe that the key thing is that in build-in-place, the assignment and associated adjust/finalize become no-ops at run-time. But if there are two objects in the "normal" semantics, then there are still two objects conceptually in the build-in-place semantics. It probably would be worth describing in more detail what sort of transformation happens upon a build-in-place "assignment," especially to an ancestor part. I think most of it could be in AARM implementation notes. It is important probably in normative words somewhere to say that access values remain valid, though for the ancestor part, they now denote a view of the "new" object whose type is that of the ancestor part. I.e., it is roughly equivalent to 'Access applied to a view conversion, e.g. T2(New_Obj_Of_Type_T3)'Access. **************************************************************** !topic Effect of build-in-place on "current instance" of a type !reference 7.5(8.1), 4.3(5), 8.6(17) !from Adam Beneschan 07-08-31 !discussion This is kind of a nitpick, but it's a bit disturbing to me. My understanding of the RM is that the "Dynamic Semantics" sections define what the effect of a legal Ada program will be when it runs. There are "Implementation Requirements" sections that impose additional constraints how things work, but those generally aren't supposed to affect the semantics, except perhaps that certain checks not take place, or things may not work exactly according to the Dynamic Semantics in some error or boundary cases, or some redundant Adjust/Finalize pairs may be eliminated. But is that the basic idea? If so, I'm not sure what to make of this: type T is limited record F1 : Integer; F2 : access T := T'Unchecked_Access; end record; R : aliased T := (F1 => 5, F2 => <>); I can't find anywhere in the RM that says that R.F2 will be R'Access after R is created. The semantics of the aggregate are that an anonymous object is created and the components are assigned (4.3(5)); based on 4.3.1(19.1) and 8.6(17), the initial value of F2 should then be an access to an anonymous object. 7.5(8.1) says that for an aggregate of a limited type, the *implementation* shall not create a separate anonymous object, but that the aggregate shall be constructed directly in the target object. In my opinion, this isn't quite strong enough; while this tells something about how the operation will be implemented (and implies that there is no extra assignment that would cause an adjust/finalize operation to be performed), it doesn't say anything to the effect that the anonymous object is "identified" with the target object, or the target object is treated as the anonymous object. The semantics of 4.3(5), 4.3.1(19.1) and 8.6(17) seem to require that F2 be initialized to point to some anonymous object, and while 7.5(8.1) says we're not supposed to create an anonymous object, it doesn't say what will replace the anonymous object for the purposes of the above RM sections. I'm sure we all know what's *supposed* to happen, and that R.F2 will point to R. But I'm not quite comfortable with that sort of fuzziness, especially given all the effort that those of you who work on the RM put into making sure things are precise. I think there's a little bit of language missing. Perhaps something needs to be added to 7.5(8.1) like, "The aggregate or function_call shall be constructed directly in the new object; and for the purposes of 8.6(17), the new object [or a corresponding subcomponent of the new object?] is considered to be the object 'associated with the execution that evaluates the usage name', when an expression that contains the current instance of the type [or the current instance of the type of a subcomponent?] is evaluated". **************************************************************** From: Tucker Taft Sent: Tuesday, September 4, 2007 4:12 PM Some discomfort with this area has already been expressed by some members of the ARG. The conclusion seems to be that we should be more explicit about what sorts of things are preserved by the "in-place" assignment. In particular, access values designating the "temporary" object will also designate the more "permanent" object. We'll factor in some of your issues when we tackle that. **************************************************************** From: Robert A. Duff Sent: Friday, September 28, 2007 1:03 PM > My view is that you should not take "build in place" > too literally from a semantic point of view. It > is more of an implementation requirement, where the > the semantics should generally work out as they would > without build-in-place. In particular, the "outer" > object of an extension aggregate doesn't really exist > until the aggregate is fully evaluated. While > evaluating the parent part, the parent part is > a separate object, that just happens to overlap > with the ultimate resting place of the outer object. > > So in my view, until you return from the function, > the return object it is building is definitely of the > result type of the function, and so the tag will > be T1 until that moment. FWIW, GNAT agrees with Tucker's interpretation. In the example below, A_Function returns T1, so it creates an object with T1'Tag. When a call to A_Function is used as the parent part of a T2 aggregate, that's still the case, but the final T2 object has T2'Tag. Makes sense to me. But dispatching within the return statement seems questionable. I wonder if it makes sense to make that a run-time error. (I can think of various ways to implement that without distributed overhead.) ---------------- with Ada.Text_IO; use Ada.Text_IO; with Ada.Tags; use Ada.Tags; package Ext_Agg_Test is type T0 is tagged limited null record; procedure Prim (X : T0); procedure Class_Wide (X : T0'Class); type T1 is new T0 with null record; procedure Prim (X : T1); type T2 is new T1 with null record; procedure Prim (X : T2); end Ext_Agg_Test; package body Ext_Agg_Test is procedure Prim (X : T0) is begin Put_Line ("Prim (T0)"); end Prim; procedure Class_Wide (X : T0'Class) is begin Put_Line ("Class_Wide (T0'Class)"); Put_Line (Expanded_Name (X'Tag)); Prim (X); Put_Line ("Class_Wide (T0'Class) done"); end Class_Wide; procedure Prim (X : T1) is begin Put_Line ("Prim (T1)"); end Prim; procedure Prim (X : T2) is begin Put_Line ("Prim (T2)"); end Prim; end Ext_Agg_Test; procedure Ext_Agg_Test.Test is function A_Function return T1 is begin return Result : T1 do if True then Class_Wide (Result); end if; end return; end A_Function; begin Put_Line ("Calling A_Function"); declare T1_Obj : T1 := A_Function; begin Put_Line ("A_Function returned"); Class_Wide (T1_Obj); end; New_Line; Put_Line ("Calling A_Function again"); declare T2_Obj : T2 := (A_Function with null record); begin Put_Line ("A_Function returned again"); Class_Wide (T2_Obj); end; end Ext_Agg_Test.Test; Here's the output: Calling A_Function Class_Wide (T0'Class) EXT_AGG_TEST.T1 Prim (T1) Class_Wide (T0'Class) done A_Function returned Class_Wide (T0'Class) EXT_AGG_TEST.T1 Prim (T1) Class_Wide (T0'Class) done Calling A_Function again Class_Wide (T0'Class) EXT_AGG_TEST.T1 Prim (T1) Class_Wide (T0'Class) done A_Function returned again Class_Wide (T0'Class) EXT_AGG_TEST.T2 Prim (T2) Class_Wide (T0'Class) done ****************************************************************