Version 1.3 of ais/ai-00373.txt

Unformatted version of ais/ai-00373.txt version 1.3
Other versions for file ais/ai-00373.txt

!standard 03.03.01(20)          04-06-08 AI95-00373/02
!class binding interpretation 04-02-05
!status received 04-01-17
!priority Low
!difficulty Hard
!subject Undefined discriminants caused by loose order of init requirements
!summary
The "arbitrary order" of component initialization referred to in 3.3.1(20) gives the implementation too much freedom. In particular, it allows the implementation to choose an ordering which leads to problems with uninitialized discriminants, uninitialized access values, etc. This problem is reduced, but not eliminated, by imposing additional restrictions on the order that an implementation may choose.
!question
The rules of 3.3.1(20) are too lax -- they allow one to refer to uninitialized discriminants, uninitialized access values, etc. Was this intended? (No.)
!recommendation
(See summary.)
!wording
Replace 3.3.1(18/1 - 20) with
3. The object is created, and, if there is not an initialization expression, any per-object expressions (see 3.8) are elaborated and any implicit initial values for the object or for its subcomponents are obtained as determined by the nominal subtype. Any initial values (whether explicit or implicit) are assigned to the object or to the corresponding subcomponents. As described in 5.2 and 7.6, Initialize and Adjust procedures can be called.
For the third step above, evaluations and assignments are performed in an arbitrary order subject to the following restrictions:
- Assignment to any part of the object is preceded
by the evaluation of the value that is to be assigned.
- The evaluation of a default_expression that includes the name of
a discriminant is preceded by the assigment to that discriminant.
- The evaluation of the default_expression for any component that
depends on a discriminant is preceded by the assignment to that discriminant.
Furthermore, a component of an object is said to require late initialization if it has an access discriminant value constrained by a per-object expression, or if it has an initialization expression which includes a name denoting the current instance of the type or denoting an access discriminant. For the third step above, the assignments to any components not requiring late initialization must precede the initial value evaluations for any components requiring late initialization; if two components both require late initialization, then the assignment to the component occurring earlier in the order of the component declarations must precede the initial value evaluation of the component occurring later.
!example
procedure Test is
type Inner; type Outer;
function Flag_Init (X : access Inner) return Boolean;
type Inner (Discrim : access Outer) is limited
record
Flag : Boolean := Flag_Init (Inner'access);
end record;
type Type_With_Lots_Of_Interesting_Components is -- contains tasks, discriminants, access values, protected records, -- controlled types, or whatever gives you heartburn ... ;
type Outer is limited record F1 : Inner (Outer'access); F2 : Type_With_Lots_Of_Interesting_Components; F3 : Inner (Outer'access); end record;
procedure Do_All_Sorts_Of_Things (Interesting : in out Type_With_Lots_Of_Interesting_Components) is separate; -- -- abort task components, dereference access values, etc.
function Flag_Init (X : access Inner) return Boolean is begin Do_All_Sorts_Of_Things (X.Discrim.all.F2); return True; end Flag_Init;
Problematic : Outer; -- -- If F2 field is not initialized ahead of both the F1 and F3 fields, -- then Do_All_Sorts_Of_Things will be invoked on a record containing -- uninitialized tasks, discriminants, access values, etc.
procedure Do_Something is separate;
begin Do_Something; end Test;
!discussion
An implementation is currently given too much freedom in choosing the order in which components are initialized. The problem is illustrated by the preceding example (see also Bob Duff's initial description of the problem).
This proposal does not completely solve the problem of evaluating uninitialized components (e.g. discriminants, access values, tasks, etc), but it greatly reduces the chances of inadvertantly introducing such a problem, either by writing new Ada source or by compiling exisiting source with another compiler.
If a type has two components which both "require late initialization", then one of them is going to be initialized first and problems may arise if this initialization involves evaluation of the second component (or any part thereof). Portability is improved by nailing down the order in which such components are initialized.
Technical notes:
1) A discriminant never requires late initialization because RM 8.3(17) implies
that the current instance of a type cannot be named in the discriminant part of the type.
2) The reference to "the order of the component declarations" is intended to
echo 7.6(12).
3) Given this example
type T (D : Some_Discrete_Type) is limited record F1 : T1 (T'access); F2 : T2 (D) ; F3 : T3 := F (D); end record;
, F2 and F3 do not "require late initialization". We want F1 initialized last in this case. Evaluation of a scalar discriminant does not cause a component to "require late initialization".
4) Given this example
type Tt (Dd : Some_Type := Some_Value) is record Ff : Some_Type := Dd; end record;
, the current wording of 3.3.1(20) allows the following (bad) order:
a) Evaluate the initial value of the Dd component. b) Evaluate the initial value of the Ff component. c) Assign the value computed in step #1 to the Dd component. d) Assign the value computed in step #2 to the Ff component.
The problem is that step b involves the evaluation of a component which is not initialized until step c. No reasonable implementation would choose this order, but the language definition shouldn't rely on that.
5) The former steps #3 and #4 (i.e. 3.3.1(18/1 and 19)) are now merged into
one step. It did not make sense to leave them as separate steps because 3.3.1(15) explicitly states that the steps are to be performed sequentially.
6) Should it be stated explicitly that the evaluation-before-initialization
cases that this proposal does not prevent (i.e. cases involving two or more components which require late finalization) result in erroneous execution? Would an AARM note be appropriate?
--!corrigendum
!ACATS test
!appendix

!subject Undefined discriminants caused by loose order of init requirements
!reference RM95-3.3.1(20)
!from Bob Duff
!problem

The rules of 3.3.1(20) are too lax -- they allow one to refer to
uninitialized discriminants.  Here's an example:

    type String_Ptr is access all String;
    type Rec(Name: String_Ptr) is limited private;

    function Init(X: access Rec) return Boolean;

    type Rec(Name: String_Ptr) is limited
        record
            Comp: Boolean := Init(Rec'Access);
        end record;

    function Init(X: access Rec) return Boolean is
    begin
        Put_Line(X.all.Name.all); -- Is this erroneous?
        return True;
    end Init;

    Thing: Rec(Name => new String'("Thing"));

This example is similar to one that came from real code (written by me).
It looks silly here, but I thought I had good reasons...

Anyway, the AdaMagic compiler generated code to call Init *before*
initializing Thing.Name.  Init thus refers to that discriminant in its
raw undefined state.  This bad behavior does not seem to be forbidden by
the RM.  GNAT apparently chose a better order.

We plan to fix our compiler (in fact, I think Tuck already did so),
but it seems like this is a hole in the RM.  You're not supposed to be
able to get at undefined discriminants without using chapter-13-ish
features.

The problem here is that 3.3.1(20) talks about direct references to
discriminants, forgetting that one can get at the discriminants via a
pointer as shown above.  We should probably add something like "All
discriminants are initialized before evaluating any expression
containing the name of the current instance..."

The alternative would be to declare the above erroneous.  I don't much
like that -- after all, I tripped over this by accident, and I wasn't
messing around with low-level chap-13 junk.

P.S. I'm impressed that Tucker tracked down the cause of this bug
quickly.  It occurred in a 150,000-line program.

****************************************************************

From: Tucker Taft
Sent: Saturday, January 17, 2004  4:34 PM

Discriminants aren't the only problem.  You have access to
the whole record.  Various pointer fields might have stack
junk in them, and derefencing them would presumably be erroneous.
To be really safe, we would have to initialize all pointer fields
to null before evaluating any default initial expressions that
involve enclosing-rec'access.  But what about pointers that
are of a "not null" subtype?  They can presumably be assumed to be
non-null under normal circumstances, so no access-check is required
before derefencing them.  And what about plain old integer fields
that are supposedly, say, integer range 1..100?  Do we have to be
sure they are preeinitialized to some in-range value before
passing blah'access?

The overall implications are pretty worrisome.  Java "solves"
this problem by first initializing everything to 0, null, etc,
and then doing further initialization.  But as indicated above,
that doesn't solve the problem for Ada because a zero-ish value
is not reasonable for all subtypes.

I think we almost have to say derefencing a value produced by
passing enclosing_rec'Access may lead to erroneous execution
if the dereference occurs prior to the completion of default
initialization.  We could special-case discriminants, but I'm
not convinced that is really doing the programmer a big favor.
They have to treat these access values with "kid" gloves in
any case.  Unfortunately, I can't think of a way to protect
against the problem, short of disallowing the use of
enclosing_rec'access as an actual parameter in a function call in
a component default initial expression.

> ...
> The alternative would be to declare the above erroneous.  I don't much
> like that -- after all, I tripped over this by accident, and I wasn't
> messing around with low-level chap-13 junk.

Unfortunately discriminants are only the tip of the iceberg...

> P.S. I'm impressed that Tucker tracked down the cause of this bug
> quickly.  It occurred in a 150,000-line program.

Aw shucks.

****************************************************************

From: Robert A. Duff
Sent: Sunday, January 18, 2004  11:37 PM

> Discriminants aren't the only problem.  You have access to
> the whole record.

Yes, I see that now, but I still think it's a good idea to special-case
discriminants.  3.3.1(20) already goes to some trouble to make sure you
can't refer to discriminants before they've been initialized.  Other
components seem less worrisome, somehow.

I'm not sure I believe the above argument...

If I were the boss, initialization would happen in order (textual order
of the component declarations), for all record types.  I doubt that's
going to fly. ;-)
At least the user could know which components must be initialized.
Maybe we could add such a rule, but only for records where 'Access (or
'Unchecked_Accessed) is used in a potentially damaging way.

We certainly don't want to make my example illegal.  The main reason for
passing the 'Access was to create two records pointing at each other.
I.e., two record types declared in a decoupled way that actually
represent a single concept in the programmer's mind.  That seems like
a legitimate thing to do.  The problem was that in addition to saving
that pointer away, I took a quick peek at one of the discriminants.
We can't tell at compile time when that's going to happen.

****************************************************************

From: Gary Dismukes
Sent: Monday, January 19, 2004  4:11 PM

> Yes, I see that now, but I still think it's a good idea to special-case
> discriminants.  3.3.1(20) already goes to some trouble to make sure you
> can't refer to discriminants before they've been initialized.  Other
> components seem less worrisome, somehow.

3.3.1(20) goes to some trouble, but it seems that the current wording
doesn't even handle the cases of discriminant defaults properly,
or at least the wording is sloppy, because the last sentence still
allows component and discriminant assignments to happen after all
of the evaluations.  In any case, I agree there are problems here,
and it would be nice to fix the wording to address more cases.

> If I were the boss, initialization would happen in order (textual order
> of the component declarations), for all record types.  I doubt that's
> going to fly. ;-)
> At least the user could know which components must be initialized.
> Maybe we could add such a rule, but only for records where 'Access (or
> 'Unchecked_Accessed) is used in a potentially damaging way.

I think it would be reasonable to add additional constraints on
evaluation and initialization along the lines of what's already
done for delaying Initialize calls for controlled components
constrained by per-object expressions as specified in 7.6(12).
The AARM says:

  12.b   The fact that Initialize is done for components with access
  discriminants after other components allows the Initialize operation
  for a component with a self-referential access discriminant to assume
  that other components of the enclosing object have already been
  properly initialized.  For multiple such components, it allows some
  predictability.

The case of normal component initializations seems similar to this
when per-object expressions are involved in defaults.  So it would
seem reasonable to add rules requiring components initialized by
these special expressions to delay evaluation until earlier components
that don't involve per-object expressions have all been initialized.

****************************************************************

From: Randy Brukardt
Sent: Monday, June 7, 2004  7:40 PM

The AI doesn't contain a summary, question, or recommendation.
Recommendation can be "(See summary.)", but the other two are required.
[Editor's note: this is about version /01 of the AI.]

The last paragraph of the new wording refers to the "fourth step", but there
is no fourth step anymore.

I think that Note 6 needs to be stated somewhere, at least in the AARM. But
it really is something users ought to know, so explicit mention in the
Standard would be a good idea.

****************************************************************

From: Stephen W Baird
Sent: Tuesday, June 8, 2004  11:52 PM

> I think that Note 6 needs to be stated somewhere, at least in the AARM. But
> it really is something users ought to know, so explicit mention in the
> Standard would be a good idea.

I agree that it belongs at least in the AARM.
I'm not sure it belongs in the standard because it seems that 13.9.1 already
covers this. Note the words "and any explicit or default initializations have
been performed" in 13.9.1(4).

****************************************************************

Questions? Ask the ACAA Technical Agent