!standard 03.03.01(20) 04-06-07 AI95-00373/01 !class binding interpretation 04-02-05 !status received 04-01-17 !priority Low !difficulty Hard !subject Undefined discriminants caused by loose order of init requirements !summary !question !recommendation !wording Replace 3.3.1(18/1 - 20) with 3. The object is created, and, if there is not an initialization expression, any per-object expressions (see 3.8) are elaborated and any implicit initial values for the object or for its subcomponents are obtained as determined by the nominal subtype. Any initial values (whether explicit or implicit) are assigned to the object or to the corresponding subcomponents. As described in 5.2 and 7.6, Initialize and Adjust procedures can be called. For the third step above, evaluations and assignments are performed in an arbitrary order subject to the following restrictions: - Assignment to any part of the object is preceded by the evaluation of the value that is to be assigned. - The evaluation of a default_expression that includes the name of a discriminant is preceded by the assigment to that discriminant. - The evaluation of the default_expression for any component that depends on a discriminant is preceded by the assignment to that discriminant. Furthermore, a component of an object is said to *require late initialization* if it has an access discriminant value constrained by a per-object expression, or if it has an initialization expression which includes a name denoting the current instance of the type or denoting an access discriminant. The assignments of the fourth step for any components not requiring late initialization must precede the evaluations of the third step for any components requiring late initialization. If two components both require late initialization, then the assignment of the fourth step for the component occurring earlier in the order of the component declarations must precede the evaluation of the third step for the component occurring later. !example procedure Test is type Inner; type Outer; function Flag_Init (X : access Inner) return Boolean; type Inner (Discrim : access Outer) is limited record Flag : Boolean := Flag_Init (Inner'Access); end record; type Type_With_Lots_Of_Interesting_Components is -- contains tasks, discriminants, access values, protected records, -- controlled types, or whatever gives you heartburn ... ; type Outer is limited record F1 : Inner (Outer'Access); F2 : Type_With_Lots_Of_Interesting_Components; F3 : Inner (Outer'Access); end record; procedure Do_All_Sorts_Of_Things (Interesting : in out Type_With_Lots_Of_Interesting_Components) is separate; -- -- abort task components, dereference access values, etc. function Flag_Init (X : access Inner) return Boolean is begin Do_All_Sorts_Of_Things (X.Discrim.all.F2); return True; end Flag_Init; Problematic : Outer; -- -- If F2 field is not initialized ahead of both the F1 and F3 fields, -- then Do_All_Sorts_Of_Things will be invoked on a record containing -- uninitialized tasks, discriminants, access values, etc. procedure Do_Something is separate; begin Do_Something; end Test; !discussion An implementation is currently given too much freedom in choosing the order in which components are initialized. The problem is illustrated by the preceding example (see also Bob Duff's initial description of the problem). This proposal does not completely solve the problem of evaluating uninitialized components (e.g. discriminants, access values, tasks, etc), but it greatly reduces the chances of inadvertantly introducing such a problem, either by writing new Ada source or by compiling exisiting source with another compiler. If a type has two components which both "require late initialization", then one of them is going to be initialized first and problems may arise if this initialization involves evaluation of the second component (or any part thereof). Portability is improved by nailing down the order in which such components are initialized. Technical notes: 1) A discriminant never requires late initialization because RM 8.3(17) implies that the current instance of a type cannot be named in the discriminant part of the type. 2) The reference to "the order of the component declarations" is intended to echo 7.6(12). 3) Given this example type T (D : Some_Discrete_Type) is limited record F1 : T1 (T'Access); F2 : T2 (D) ; F3 : T3 := F (D); end record; , F2 and F3 do not "require late initialization". We want F1 initialized last in this case. Evaluation of a scalar discriminant does not cause a component to "require late initialization". 4) Given this example type Tt (Dd : Some_Type := Some_Value) is record Ff : Some_Type := Dd; end record; , the current wording of 3.3.1(20) allows the following (bad) order: a) Evaluate the initial value of the Dd component. b) Evaluate the initial value of the Ff component. c) Assign the value computed in step #1 to the Dd component. d) Assign the value computed in step #2 to the Ff component. The problem is that step b involves the evaluation of a component which is not initialized until step c. No reasonable implementation would choose this order, but the language definition shouldn't rely on that. 5) The former steps #3 and #4 (i.e. 3.3.1(18/1 and 19)) are now merged into one step. It did not make sense to leave them as separate steps because 3.3.1(15) explicitly states that the steps are to be performed sequentially. 6) Should it be stated explicitly that the evaluation-before-initialization cases that this proposal does not prevent (i.e. cases involving two or more components which require late finalization) result in erroneous execution? Would an AARM note be appropriate? --!corrigendum !ACATS test !appendix !subject Undefined discriminants caused by loose order of init requirements !reference RM95-3.3.1(20) !from Bob Duff !problem The rules of 3.3.1(20) are too lax -- they allow one to refer to uninitialized discriminants. Here's an example: type String_Ptr is access all String; type Rec(Name: String_Ptr) is limited private; function Init(X: access Rec) return Boolean; type Rec(Name: String_Ptr) is limited record Comp: Boolean := Init(Rec'Access); end record; function Init(X: access Rec) return Boolean is begin Put_Line(X.all.Name.all); -- Is this erroneous? return True; end Init; Thing: Rec(Name => new String'("Thing")); This example is similar to one that came from real code (written by me). It looks silly here, but I thought I had good reasons... Anyway, the AdaMagic compiler generated code to call Init *before* initializing Thing.Name. Init thus refers to that discriminant in its raw undefined state. This bad behavior does not seem to be forbidden by the RM. GNAT apparently chose a better order. We plan to fix our compiler (in fact, I think Tuck already did so), but it seems like this is a hole in the RM. You're not supposed to be able to get at undefined discriminants without using chapter-13-ish features. The problem here is that 3.3.1(20) talks about direct references to discriminants, forgetting that one can get at the discriminants via a pointer as shown above. We should probably add something like "All discriminants are initialized before evaluating any expression containing the name of the current instance..." The alternative would be to declare the above erroneous. I don't much like that -- after all, I tripped over this by accident, and I wasn't messing around with low-level chap-13 junk. P.S. I'm impressed that Tucker tracked down the cause of this bug quickly. It occurred in a 150,000-line program. **************************************************************** From: Tucker Taft Sent: Saturday, January 17, 2004 4:34 PM Discriminants aren't the only problem. You have access to the whole record. Various pointer fields might have stack junk in them, and derefencing them would presumably be erroneous. To be really safe, we would have to initialize all pointer fields to null before evaluating any default initial expressions that involve enclosing-rec'access. But what about pointers that are of a "not null" subtype? They can presumably be assumed to be non-null under normal circumstances, so no access-check is required before derefencing them. And what about plain old integer fields that are supposedly, say, integer range 1..100? Do we have to be sure they are preeinitialized to some in-range value before passing blah'access? The overall implications are pretty worrisome. Java "solves" this problem by first initializing everything to 0, null, etc, and then doing further initialization. But as indicated above, that doesn't solve the problem for Ada because a zero-ish value is not reasonable for all subtypes. I think we almost have to say derefencing a value produced by passing enclosing_rec'Access may lead to erroneous execution if the dereference occurs prior to the completion of default initialization. We could special-case discriminants, but I'm not convinced that is really doing the programmer a big favor. They have to treat these access values with "kid" gloves in any case. Unfortunately, I can't think of a way to protect against the problem, short of disallowing the use of enclosing_rec'access as an actual parameter in a function call in a component default initial expression. > ... > The alternative would be to declare the above erroneous. I don't much > like that -- after all, I tripped over this by accident, and I wasn't > messing around with low-level chap-13 junk. Unfortunately discriminants are only the tip of the iceberg... > P.S. I'm impressed that Tucker tracked down the cause of this bug > quickly. It occurred in a 150,000-line program. Aw shucks. **************************************************************** From: Robert A. Duff Sent: Sunday, January 18, 2004 11:37 PM > Discriminants aren't the only problem. You have access to > the whole record. Yes, I see that now, but I still think it's a good idea to special-case discriminants. 3.3.1(20) already goes to some trouble to make sure you can't refer to discriminants before they've been initialized. Other components seem less worrisome, somehow. I'm not sure I believe the above argument... If I were the boss, initialization would happen in order (textual order of the component declarations), for all record types. I doubt that's going to fly. ;-) At least the user could know which components must be initialized. Maybe we could add such a rule, but only for records where 'Access (or 'Unchecked_Accessed) is used in a potentially damaging way. We certainly don't want to make my example illegal. The main reason for passing the 'Access was to create two records pointing at each other. I.e., two record types declared in a decoupled way that actually represent a single concept in the programmer's mind. That seems like a legitimate thing to do. The problem was that in addition to saving that pointer away, I took a quick peek at one of the discriminants. We can't tell at compile time when that's going to happen. **************************************************************** From: Gary Dismukes Sent: Monday, January 19, 2004 4:11 PM > Yes, I see that now, but I still think it's a good idea to special-case > discriminants. 3.3.1(20) already goes to some trouble to make sure you > can't refer to discriminants before they've been initialized. Other > components seem less worrisome, somehow. 3.3.1(20) goes to some trouble, but it seems that the current wording doesn't even handle the cases of discriminant defaults properly, or at least the wording is sloppy, because the last sentence still allows component and discriminant assignments to happen after all of the evaluations. In any case, I agree there are problems here, and it would be nice to fix the wording to address more cases. > If I were the boss, initialization would happen in order (textual order > of the component declarations), for all record types. I doubt that's > going to fly. ;-) > At least the user could know which components must be initialized. > Maybe we could add such a rule, but only for records where 'Access (or > 'Unchecked_Accessed) is used in a potentially damaging way. I think it would be reasonable to add additional constraints on evaluation and initialization along the lines of what's already done for delaying Initialize calls for controlled components constrained by per-object expressions as specified in 7.6(12). The AARM says: 12.b The fact that Initialize is done for components with access discriminants after other components allows the Initialize operation for a component with a self-referential access discriminant to assume that other components of the enclosing object have already been properly initialized. For multiple such components, it allows some predictability. The case of normal component initializations seems similar to this when per-object expressions are involved in defaults. So it would seem reasonable to add rules requiring components initialized by these special expressions to delay evaluation until earlier components that don't involve per-object expressions have all been initialized. **************************************************************** From: Randy Brukardt Sent: Monday, June 7, 2004 7:40 PM The AI doesn't contain a summary, question, or recommendation. Recommendation can be "(See summary.)", but the other two are required. [Editor's note: this is about version /01 of the AI.] The last paragraph of the new wording refers to the "fourth step", but there is no fourth step anymore. I think that Note 6 needs to be stated somewhere, at least in the AARM. But it really is something users ought to know, so explicit mention in the Standard would be a good idea. ****************************************************************