Version 1.12 of ais/ai-00373.txt

Unformatted version of ais/ai-00373.txt version 1.12
Other versions for file ais/ai-00373.txt

!standard 03.03.01(08)          05-10-12 AI95-00373/08
!standard 03.03.01(18/1)
!standard 03.03.01(19)
!standard 03.03.01(20)
!standard 04.08(07)
!standard 04.08(08)
!standard 04.08(10)
!standard 07.06(10)
!standard 07.06(11)
!class binding interpretation 04-02-05
!status Amendment 200Y 04-12-02
!status WG9 Approved 06-06-09
!status ARG Approved 9-0-1 04-11-21
!status work item 04-06-07
!status received 04-01-17
!priority Low
!difficulty Hard
!subject Undefined discriminants caused by loose order of init requirements
!summary
The "arbitrary order" of component initialization referred to in 3.3.1(20) gives the implementation too much freedom. In particular, it allows the implementation to choose an ordering which leads to problems with uninitialized discriminants, uninitialized access values, etc. This problem is reduced, but not eliminated, by imposing additional restrictions on the order that an implementation may choose.
"Initialized by default" is defined, and it is used to define when Initialize is called. (And will be used in other AIs to define the initialization of other kinds of objects as well.)
!question
The rules of 3.3.1(20) are too lax -- they allow one to refer to uninitialized discriminants, uninitialized access values, etc. Was this intended? (No.)
We want Initialize always to be called for an object without an initialization expression (other than aggregates), but this is done on a case-by-case basis. Wouldn't it be better to do this with a global rule? (Yes.)
!recommendation
(See summary.)
!wording
Insert after 3.3.1(8):
A component of an object is said to require late initialization if it has an access discriminant value constrained by a per-object expression, or if it has an initialization expression that includes a name denoting the current instance of the type or denoting an access discriminant.
Replace 3.3.1(18/1 - 20) with
3. The object is created, and, if there is not an initialization expression, the object is initialized by default. When an object is initialized by default, any per-object expressions (see 3.8) are elaborated and any implicit initial values for the object or for its subcomponents are obtained as determined by the nominal subtype. Any initial values (whether explicit or implicit) are assigned to the object or to the corresponding subcomponents. As described in 5.2 and 7.6, Initialize and Adjust procedures can be called.
For the third step above, evaluations and assignments are performed in an arbitrary order subject to the following restrictions:
- Assignment to any part of the object is preceded
by the evaluation of the value that is to be assigned.
- The evaluation of a default_expression that includes the name of
a discriminant is preceded by the assigment to that discriminant.
- The evaluation of the default_expression for any component that
depends on a discriminant is preceded by the assignment to that discriminant.
- The assignments to any components, including implicit components,
not requiring late initialization must precede the initial value evaluations for any components requiring late initialization; if two components both require late initialization, then assignments to parts of the component occurring earlier in the order of the component declarations must precede the initial value evaluations of the component occurring later.
Replace 4.8(7-10) with
For the evaluation of an initialized allocator, the evaluation of the qualified_expression is performed first. An object of the designated type is created and the value of the qualified_expression is converted to the designated subtype and assigned to the object.
For the evaluation of an uninitialized allocator, the elaboration of the subtype_indication is performed first. Then:
If the designated type is elementary, an object of the designated subtype is created and any implicit initial value is assigned;
If the designated type is composite, an object of the designated type is created with tag, if any, determined by the subtype_mark of the subtype_indication. This object is then initialized by default (see 3.3.1) using the subtype_indication to determine its nominal subtype. A check is made that the value of the object belongs to the designated subtype. Constraint_Error is raised if this check fails. This check and the initialization of the object are performed in an arbitrary order.
Replace 7.6(10-11) with
During the elaboration or evaluation of a construct that causes an object to be initialized by default, for every controlled subcomponent of the object that is not assigned an initial value (as defined in 3.3.1), Initialize is called on that subcomponent. Similarly, if the object that is initialized by default as a whole is controlled, Initialize is called on the object.
For an extension_aggregate whose ancestor_part is a subtype_mark denoting a controlled subtype, the Initialize procedure of the ancestor type is called, unless that Initialize procedure is abstract.
!example
procedure Test is
type Inner; type Outer;
function Flag_Init (X : access Inner) return Boolean;
type Inner (Discrim : access Outer) is limited
record
Flag : Boolean := Flag_Init (Inner'access);
end record;
type Type_With_Lots_Of_Interesting_Components is -- contains tasks, discriminants, access values, protected records, -- controlled types, or whatever gives you heartburn ... ;
type Outer is limited record F1 : Inner (Outer'access); F2 : Type_With_Lots_Of_Interesting_Components; F3 : Inner (Outer'access); end record;
procedure Do_All_Sorts_Of_Things (Interesting : in out Type_With_Lots_Of_Interesting_Components) is separate; -- -- abort task components, dereference access values, etc.
function Flag_Init (X : access Inner) return Boolean is begin Do_All_Sorts_Of_Things (X.Discrim.all.F2); return True; end Flag_Init;
Problematic : Outer; -- -- If F2 field is not initialized ahead of both the F1 and F3 fields, -- then Do_All_Sorts_Of_Things will be invoked on a record containing -- uninitialized tasks, discriminants, access values, etc.
procedure Do_Something is separate;
begin Do_Something; end Test;
!discussion
An implementation is currently given too much freedom in choosing the order in which components are initialized. The problem is illustrated by the preceding example (see also Bob Duff's initial description of the problem in the Appendix).
This proposal does not completely solve the problem of evaluating uninitialized components (e.g. discriminants, access values, tasks, etc), but it greatly reduces the chances of inadvertantly introducing such a problem, either by writing new Ada source or by compiling exisiting source with another compiler.
The problem cannot be completely solved because a type that has two components which both "require late initialization", then necessarily one of them is going to be initialized first and problems may arise if this initialization involves evaluation of the second component (or any part thereof). Portability is improved by nailing down the order in which such components are initialized.
Technical notes:
1) A discriminant never requires late initialization because RM 8.3(17) implies
that the current instance of a type cannot be named in the discriminant part of the type.
2) The reference to "the order of the component declarations" is intended to
echo 7.6(12).
3) Consider:
type T (D : Some_Discrete_Type) is limited record F1 : T1 (T'access); F2 : T2 (D) ; F3 : T3 := F (D); end record;
Here, F2 and F3 do not "require late initialization". We want F1 initialized last in this case. Evaluation of a scalar discriminant does not cause a component to "require late initialization".
4) Consider:
type Tt (Dd : Some_Type := Some_Value) is record Ff : Some_Type := Dd; end record;
The Ada 95 Standard wording of 3.3.1(20) allows the following (bad) order:
a) Evaluate the initial value of the Dd component. b) Evaluate the initial value of the Ff component. c) Assign the value computed in step #1 to the Dd component. d) Assign the value computed in step #2 to the Ff component.
The problem is that step b involves the evaluation of a component which is not initialized until step c. No reasonable implementation would choose this order, but the language definition shouldn't rely on that.
5) The former steps #3 and #4 (i.e. 3.3.1(18/1 and 19)) are now merged into
one step. It did not make sense to leave them as separate steps because 3.3.1(15) explicitly states that the steps are to be performed sequentially.
6) It does not need to be stated explicitly that the
evaluation-before-initialization cases that this proposal does not prevent (i.e. cases involving two or more components which require late finalization) result in erroneous execution. This is already a consequence of existing rules (13.9.1(8) and the first sentence of 13.9.1(4), particularly the clause "and any explicit or default initializations have been performed"). Still, an AARM note might be appropriate.
!corrigendum 3.3.1(8)
Insert after the paragraph:
The subtype_indication or full type definition of an object_declaration defines the nominal subtype of the object. The object_declaration declares an object of the type of the nominal subtype.
the new paragraph:
A component of an object is said to require late initialization if it has an access discriminant value constrained by a per-object expression, or if it has an initialization expression that includes a name denoting the current instance of the type or denoting an access discriminant.
!corrigendum 3.3.1(18/1)
Replace the paragraph:
3.
The object is created, and, if there is not an initialization expression, any per-object constraints (see 3.8) are elaborated and any implicit initial values for the object or for its subcomponents are obtained as determined by the nominal subtype.
by:
3.
The object is created, and, if there is not an initialization expression, the object is initialized by default. When an object is initialized by default, any per-object constraints (see 3.8) are elaborated and any implicit initial values for the object or for its subcomponents are obtained as determined by the nominal subtype. Any initial values (whether explicit or implicit) are assigned to the object or to the corresponding subcomponents. As described in 5.2 and 7.6, Initialize and Adjust procedures can be called.
!corrigendum 3.3.1(19)
Delete the paragraph:
4.
Any initial values (whether explicit or implicit) are assigned to the object or to the corresponding subcomponents. As described in 5.2 and 7.6, Initialize and Adjust procedures can be called.
!corrigendum 3.3.1(20)
Replace the paragraph:
For the third step above, the object creation and any elaborations and evaluations are performed in an arbitrary order, except that if the default_expression for a discriminant is evaluated to obtain its initial value, then this evaluation is performed before that of the default_expression for any component that depends on the discriminant, and also before that of any default_expression that includes the name of the discriminant. The evaluations of the third step and the assignments of the fourth step are performed in an arbitrary order, except that each evaluation is performed before the resulting value is assigned.
by:
For the third step above, evaluations and assignments are performed in an arbitrary order subject to the following restrictions:
!corrigendum 4.8(07)
Replace the paragraph:
For the evaluation of an allocator, the elaboration of the subtype_indication or the evaluation of the qualified_expression is performed first. For the evaluation of an initialized allocator, an object of the designated type is created and the value of the qualified_expression is converted to the designated subtype and assigned to the object.
by:
For the evaluation of an initialized allocator, the evaluation of the qualified_expression is performed first. An object of the designated type is created and the value of the qualified_expression is converted to the designated subtype and assigned to the object.
!corrigendum 4.8(08)
Replace the paragraph:
For the evaluation of an uninitialized allocator:
by:
For the evaluation of an uninitialized allocator, the elaboration of the subtype_indication is performed first. Then:
!corrigendum 4.8(10)
Replace the paragraph:
by:
!corrigendum 7.6(10)
Replace the paragraph:
During the elaboration of an object_declaration, for every controlled subcomponent of the object that is not assigned an initial value (as defined in 3.3.1), Initialize is called on that subcomponent. Similarly, if the object as a whole is controlled and is not assigned an initial value, Initialize is called on the object. The same applies to the evaluation of an allocator, as explained in 4.8.
by:
During the elaboration or evaluation of a construct that causes an object to be initialized by default, for every controlled subcomponent of the object that is not assigned an initial value (as defined in 3.3.1), Initialize is called on that subcomponent. Similarly, if the object that is initialized by default as a whole is controlled, Initialize is called on the object.
!corrigendum 7.6(11)
Replace the paragraph:
For an extension_aggregate whose ancestor_part is a subtype_mark, for each controlled subcomponent of the ancestor part, either Initialize is called, or its initial value is assigned, as appropriate Initialize is called on all controlled subcomponents of the ancestor part; if the type of the ancestor part is itself controlled, the Initialize procedure of the ancestor type is called, unless that Initialize procedure is abstract.
by:
For an extension_aggregate whose ancestor_part is a subtype_mark denoting a controlled subtype, the Initialize procedure of the ancestor type is called, unless that Initialize procedure is abstract.
!ACATS test
Create an ACATS test to check that cases like the example evaluate in the correct order.
!appendix

!subject Undefined discriminants caused by loose order of init requirements
!reference RM95-3.3.1(20)
!from Bob Duff
!problem

The rules of 3.3.1(20) are too lax -- they allow one to refer to
uninitialized discriminants.  Here's an example:

    type String_Ptr is access all String;
    type Rec(Name: String_Ptr) is limited private;

    function Init(X: access Rec) return Boolean;

    type Rec(Name: String_Ptr) is limited
        record
            Comp: Boolean := Init(Rec'Access);
        end record;

    function Init(X: access Rec) return Boolean is
    begin
        Put_Line(X.all.Name.all); -- Is this erroneous?
        return True;
    end Init;

    Thing: Rec(Name => new String'("Thing"));

This example is similar to one that came from real code (written by me).
It looks silly here, but I thought I had good reasons...

Anyway, the AdaMagic compiler generated code to call Init *before*
initializing Thing.Name.  Init thus refers to that discriminant in its
raw undefined state.  This bad behavior does not seem to be forbidden by
the RM.  GNAT apparently chose a better order.

We plan to fix our compiler (in fact, I think Tuck already did so),
but it seems like this is a hole in the RM.  You're not supposed to be
able to get at undefined discriminants without using chapter-13-ish
features.

The problem here is that 3.3.1(20) talks about direct references to
discriminants, forgetting that one can get at the discriminants via a
pointer as shown above.  We should probably add something like "All
discriminants are initialized before evaluating any expression
containing the name of the current instance..."

The alternative would be to declare the above erroneous.  I don't much
like that -- after all, I tripped over this by accident, and I wasn't
messing around with low-level chap-13 junk.

P.S. I'm impressed that Tucker tracked down the cause of this bug
quickly.  It occurred in a 150,000-line program.

****************************************************************

From: Tucker Taft
Sent: Saturday, January 17, 2004  4:34 PM

Discriminants aren't the only problem.  You have access to
the whole record.  Various pointer fields might have stack
junk in them, and derefencing them would presumably be erroneous.
To be really safe, we would have to initialize all pointer fields
to null before evaluating any default initial expressions that
involve enclosing-rec'access.  But what about pointers that
are of a "not null" subtype?  They can presumably be assumed to be
non-null under normal circumstances, so no access-check is required
before derefencing them.  And what about plain old integer fields
that are supposedly, say, integer range 1..100?  Do we have to be
sure they are preeinitialized to some in-range value before
passing blah'access?

The overall implications are pretty worrisome.  Java "solves"
this problem by first initializing everything to 0, null, etc,
and then doing further initialization.  But as indicated above,
that doesn't solve the problem for Ada because a zero-ish value
is not reasonable for all subtypes.

I think we almost have to say derefencing a value produced by
passing enclosing_rec'Access may lead to erroneous execution
if the dereference occurs prior to the completion of default
initialization.  We could special-case discriminants, but I'm
not convinced that is really doing the programmer a big favor.
They have to treat these access values with "kid" gloves in
any case.  Unfortunately, I can't think of a way to protect
against the problem, short of disallowing the use of
enclosing_rec'access as an actual parameter in a function call in
a component default initial expression.

> ...
> The alternative would be to declare the above erroneous.  I don't much
> like that -- after all, I tripped over this by accident, and I wasn't
> messing around with low-level chap-13 junk.

Unfortunately discriminants are only the tip of the iceberg...

> P.S. I'm impressed that Tucker tracked down the cause of this bug
> quickly.  It occurred in a 150,000-line program.

Aw shucks.

****************************************************************

From: Robert A. Duff
Sent: Sunday, January 18, 2004  11:37 PM

> Discriminants aren't the only problem.  You have access to
> the whole record.

Yes, I see that now, but I still think it's a good idea to special-case
discriminants.  3.3.1(20) already goes to some trouble to make sure you
can't refer to discriminants before they've been initialized.  Other
components seem less worrisome, somehow.

I'm not sure I believe the above argument...

If I were the boss, initialization would happen in order (textual order
of the component declarations), for all record types.  I doubt that's
going to fly. ;-)
At least the user could know which components must be initialized.
Maybe we could add such a rule, but only for records where 'Access (or
'Unchecked_Accessed) is used in a potentially damaging way.

We certainly don't want to make my example illegal.  The main reason for
passing the 'Access was to create two records pointing at each other.
I.e., two record types declared in a decoupled way that actually
represent a single concept in the programmer's mind.  That seems like
a legitimate thing to do.  The problem was that in addition to saving
that pointer away, I took a quick peek at one of the discriminants.
We can't tell at compile time when that's going to happen.

****************************************************************

From: Gary Dismukes
Sent: Monday, January 19, 2004  4:11 PM

> Yes, I see that now, but I still think it's a good idea to special-case
> discriminants.  3.3.1(20) already goes to some trouble to make sure you
> can't refer to discriminants before they've been initialized.  Other
> components seem less worrisome, somehow.

3.3.1(20) goes to some trouble, but it seems that the current wording
doesn't even handle the cases of discriminant defaults properly,
or at least the wording is sloppy, because the last sentence still
allows component and discriminant assignments to happen after all
of the evaluations.  In any case, I agree there are problems here,
and it would be nice to fix the wording to address more cases.

> If I were the boss, initialization would happen in order (textual order
> of the component declarations), for all record types.  I doubt that's
> going to fly. ;-)
> At least the user could know which components must be initialized.
> Maybe we could add such a rule, but only for records where 'Access (or
> 'Unchecked_Accessed) is used in a potentially damaging way.

I think it would be reasonable to add additional constraints on
evaluation and initialization along the lines of what's already
done for delaying Initialize calls for controlled components
constrained by per-object expressions as specified in 7.6(12).
The AARM says:

  12.b   The fact that Initialize is done for components with access
  discriminants after other components allows the Initialize operation
  for a component with a self-referential access discriminant to assume
  that other components of the enclosing object have already been
  properly initialized.  For multiple such components, it allows some
  predictability.

The case of normal component initializations seems similar to this
when per-object expressions are involved in defaults.  So it would
seem reasonable to add rules requiring components initialized by
these special expressions to delay evaluation until earlier components
that don't involve per-object expressions have all been initialized.

****************************************************************

From: Randy Brukardt
Sent: Monday, June 7, 2004  7:40 PM

The AI doesn't contain a summary, question, or recommendation.
Recommendation can be "(See summary.)", but the other two are required.
[Editor's note: this is about version /01 of the AI.]

The last paragraph of the new wording refers to the "fourth step", but there
is no fourth step anymore.

I think that Note 6 needs to be stated somewhere, at least in the AARM. But
it really is something users ought to know, so explicit mention in the
Standard would be a good idea.

****************************************************************

From: Stephen W Baird
Sent: Tuesday, June 8, 2004  11:52 PM

> I think that Note 6 needs to be stated somewhere, at least in the AARM. But
> it really is something users ought to know, so explicit mention in the
> Standard would be a good idea.

I agree that it belongs at least in the AARM.
I'm not sure it belongs in the standard because it seems that 13.9.1 already
covers this. Note the words "and any explicit or default initializations have
been performed" in 13.9.1(4).

****************************************************************

From: Bob Duff
Sent: Monday, June 6, 2005  12:33 PM

I'm using draft 11.8 of the [A]ARM.

3.3.1(18.a) says:

    18.a  Discussion: For a per-object constraint that contains some
          per-object expressions and some non-per-object expressions, the
          values used for the constraint consist of the values of the
          non-per-object expressions evaluated at the point of the
          type_declaration, and the values of the per-object expressions
          evaluated at the point of the creation of the object.

Is this still correct, given new rules about per-object constraints?

****************************************************************

From: Tucker Taft
Sent: Monday, June 6, 2005  4:33 PM

I would say it is still true.  We might have changed the
order in which things are done, but the notion of what
is a "per-object constraint" hasn't changed.

****************************************************************

From: Pascal Leroy
Sent: Tuesday, June 7, 2005  3:41 AM

I agree.

****************************************************************

From: Bob Duff
Sent: Monday, June 6, 2005  12:34 PM

I'm using draft 11.8 of the [A]ARM.

3.3.1(20.h/2):

    20.h/2 It is possible for there to be more than one component that
          requires late initialization. In this case, the language can't
          prevent problems, because all of the components can't be the last
          one initialized. In this case, we specify the order of
          initialization for components requiring late initialization; by
          doing so, programmers can arrange their code to avoid accessing
          uninitialized components, and such arrangements are portable. Note
          that if the program accesses an uninitialized component, 13.9.1
          defines the execution to be erroneous.
                                      ^^^^^^^^^
Shouldn't that be "bounded error"?

****************************************************************

From: Stephen W Baird
Sent: Monday, June 6, 2005  2:54 PM

> Shouldn't that be "bounded error"?

No; I think that "erroneous" is correct here.

13.9.1(4) states that an object is not guaranteed to be normal until "any
explicit or default initializations have been performed".
13.9.1(8) states that "it is erroneous to evaluate ... an abnormal
object".

Note that we're not just talking about invalid scalars here - we're
talking about uninitialized access values, tasks, discriminants, etc.

****************************************************************

From: Pascal Leroy
Sent: Tuesday, June 7, 2005  3:25 AM

> Shouldn't that be "bounded error"?

I believe it is definitely "erroneous", since we are talking accessing
discriminants before they have been initialized and other very nasty
things here.

****************************************************************

Questions? Ask the ACAA Technical Agent