Version 1.4 of ai05s/ai05-0027-1.txt

Unformatted version of ai05s/ai05-0027-1.txt version 1.4
Other versions for file ai05s/ai05-0027-1.txt

!standard A.18.2(239/2)          07-10-28 AI05-0027-1/03
!standard A.18.3(152/2)
!standard A.18.4(75/2)
!standard A.18.7(96/2)
!class binding interpretation 06-11-13
!status work item 06-11-13
!status received 06-11-03
!priority Medium
!difficulty Easy
!qualifier Omission
!subject Behavior of containers operations when passed finalized container objects
!summary
If an operation is passed a container object that has been finalized, it is a bounded error. If the operation is read-only, then either the operation proceeds as though the container were empty, or Constraint_Error or Program_Error is raised.
If a finalized container is passed to an operation that would update the container, Constraint_Error or Program_Error is raised.
!question
What happens in container operations when passed a finalized container object?
!recommendation
(See Summary.)
Add the following after A.18.2(239):
It is a bounded error to call any subprogram declared in Containers.Vectors when the associated container has been finalized. If the operation takes Container as an IN OUT parameter, then it raises Constraint_Error or Program_Error. Otherwise, the operation either proceeds as it would for an empty container, or it raises Constraint_Error or Program_Error.
Make corresponding additions to all other container packages, after A.18.3(152), after A.18.4(75) [needs "Bounded Error" label as well], and after A.18.7(96) [needs "Bounded Error" label].
!discussion
It requires some circumlocution (though see the example for a plausible situation) to create a situation where an operation can be applied to a finalized container. Nevertheless, we don't want such an operation to de-finalize the container, and thereby create storage leaks or worse. Hence, we do not allow any operation that updates the container to proceed, and any other operation either raises an exception, or proceeds as though the container were empty. This minimizes any distributed overhead associated with handling the finalized case (because all non-updating operations need not distinguish a finalized state from an empty state), while ensuring that no storage leak or potentially erroneous execution is introduced by such "late" operations.
We allow Constraint_Error rather than Program_Error for efficiency reasons as well, since many of the operations that attempt to update an empty container will already raise Constraint_Error. It is only the updating operations that allow for the container to start out empty that will need an explicit check for the finalized state.
We have suggesting adding this paragraph before the discussion of ambiguous cursors, as we want that discussion to flow right into the discussion of invalid curaors.
!example
Access to a finalized container can happen if a Finalize routine accesses a container object declared in the same master as the Finalize routine. This is more likely than it appears at first glance.
Imagine for a moment that someone needs to instrument a controlled type to collect some usage information. The program might be structured like:
package Pack is type T is new Ada.Finalization.Controlled with ... overriding procedure Finalize (Object : in out T);
type Acc_T is access all T'Class; A_List : Acc_T; end Pack;
package body Pack is package Usage_History is new Ada.Containers.Vectors (...); Finalzation_Usage: Usage_History.Vector;
procedure Finalize (Object : in out T) is begin -- operations on Finalization_Usage to collect data on the -- finalization of objects of T. end Finalize; end Pack;
When the program finalizes, any allocated objects remaining on A_List will be finalized when the finalization of the collection for type Acc_T is occurs. However, since finalization occurs in reverse order of declaration, the object Finalization_Usage will be finalized before the collection for Acc_T. So those finalizations will be accessing an object that is already finalized.
Note that it doesn't matter where A_List is declared; it only matters where the access type is declared relative to the collection object. It's not unusual for that type to be declared visibly (this program structure is taken from the CLAW Windows interface library), and good Ada practice is to hide things in the body when possible. But doing so can easily cause finalization anomolies (see AI95-00280 for some examples).
--!corrigendum A.18.2(239/2)
!ACATS test
!appendix

From: Pascal Leroy
Date: Wednesday, November  8, 2006  6:02 AM

The RM doesn't seem to say what happens when you operate on a finalized
container.

I believe that any subprogram call that involves a finalized container
should raise P_E: there doesn't seem to be any reason to make this
unspecified or erroneous (it's easy enough to detect: just keep a
Finalized bit in the container header), and there doesn't seem to be any
reason to try and define sensible semantics (who plays with finalized
containers anyway?).  Such an occurrence is likely to be a bug, and the
user will be glad to hear about it.

This should be nailed down in the RM.

****************************************************************

From: Robert Dewar
Date: Wednesday, November  8, 2006  6:09 AM

I would prefer this to be erroneous, to avoid the extra overhead of
the test. If an implementation decides unconditionally or under
control of some flag that it should make the check, it is free to
do so.

****************************************************************

From: Robert I. Eachus
Date: Wednesday, November  8, 2006  3:56 PM

>This should be nailed down in the RM.

At best this is a bad idea, at worst, a disaster for existing code.  
Let's think this through, you want a bit, let's call it 'Finalized in 
every container.  Shades of 'Terminated in tasks returned from inside 
functions.  (I guess that was before your involvement with the ARG.)  
The AI gave the legalistic obvious answer, and as a result tasking in 
the DEC compiler in particular became a lot less useful.  We invented 
the term pathology to describe that kind of AI, but it as you would 
expect took years for the damage to be undone.

Let's look at the properties of a hypothetical 'Finalized attribute. 
Without the proposed rule, it is harmless.  But add the requirement to 
raise P_E and it becomes a disaster.  A call to Foo'Finalized will 
either return True, or raise Program_Error.  Oops!  Similarly no 
procedure can finalize a container passed as an in out parameter, or 
call any procedure or function that would finalize a container passed in 
as a parameter.

What do we gain for this?  Safety?  Hardly.  Dave Emery referred to 
Storage_Error as a parachute that opens on impact.  With this rule 
finalized containers become hand grenades that explode when touched.  
Adding the attribute, or a better a pair of Finalized functions to 
Ada.Controlled might be a nice improvement for the next version of Ada.  
That would allow subprograms that could be called with a finalized 
object to determine the state of the object.  Of course, I can create a 
package My_Controlled, that does all that for types descended from it.  
That, in some cases would be a safety improvement, and is otherwise 
harmless. 

Actually I have done something similar to this in a couple of 
instances.  If you are counting the (maximum) number of  controlled 
objects of a particular type, you don't want to decrement the count 
twice when an object is finalized twice, or decrement the count when an 
object is finalized before initialization.  Putting a bit in the object 
that is set during initialization and reset if set by finalization fixes 
all that.  Why do you want to count the number of objects of a type?  
Usually for optimization purposes.  If you know how large the collection 
gets, you can use a (fixed size) array with a free list.  Yes, that 
change means changing the code later on, but the actual change is small, 
in number of lines, and often necessary to meet real-time requirements.  
(Including those that require all allocation to be done during 
initialization.)

****************************************************************

From: Matthew Heaney
Date: Wednesday, November  8, 2006  4:44 PM

> At best this is a bad idea, at worst, a disaster for existing code.  

I can't see how this follows from Pascal's suggestion.  All he's is advocating
is using a state variable to remember whether finalization has occurred for
this container instance.  Containers already have state variables to detect
tampering, so what he's asking for is very modest.

In fact, you could use the state variables that already exist.  For example in
GNAT each container object has two counters ("busy" and "lock" if you've
perused the sources) to detect tampering with elements and cursors.  Right now
when a container object is initialized, those counters get set to 0.  When the
container is finalized, we check to ensure that they're 0 (to detect
finalization during Query_Element, etc), and if not to raise Program_Error.

To implement Pascal's suggestion, all you'd have to do is set one or both of
the counters to some nonce value (-1, say) in Finalize.  Other container
operations would simply check the counter on entry, and raise PE if the counter
were less than 0.

****************************************************************

From: Robert Dewar
Date: Wednesday, November  8, 2006  6:41 PM

Sounds reasonable to me ... makes me change my mind on this issue, I 
think raising PE makes sense.

****************************************************************

From: Matthew Heaney
Date: Wednesday, November  8, 2006  9:19 PM

I am curious, though, about what motivated Pascal's suggestion: under what
circumstances would someone be able to refer to an object that has been
finalized?  If finalization happens immediately prior to the object being
destroyed, then how can someone refer to the object at all?

****************************************************************

From: Randy Brukardt
Date: Wednesday, November  8, 2006  9:39 PM

Actually, it's quite easy to create such cases, although the code is pretty
unlikely in practice. (OTOH, cases like it occurred in early versions of
Claw, so it's not impossible for them to happen in real code.) AI-280
discusses some related cases (that AI fixed several bugs in this area) and
has an example. Typically, the access occurs in the finalization routine of
some object declared in the same declarative_part. Remember that all of the
objects of a declarative part are finalized before any of them are destroyed
(the language is quite clear that the memory isn't deallocated for each
object individually).

****************************************************************

From: Robert I. Eachus
Date: Thursday, November  9, 2006  1:36 AM

Matt said: I can't see how this follows from Pascal's suggestion.  All he's is advocating
is using a state variable to remember whether finalization has occurred for
this container instance.  Containers already have state variables to detect
tampering, so what he's asking for is very modest.

I thought I made it clear that the problem is not in marking finalized 
objects as finalized. Pascal also wants passing a finalized container as 
a parameter to raise Program_Error.   The problem is that if you have 
complex finalization code--the only case I can think of where this would 
matter--then you end up juggling hand grenades.

But I just realized that raising Program_Error is silly, and  thus falls 
under the Dewar rule.  Right now there are cases where an object gets 
finalized twice.  Should a second call to Finalize, raise Program_Error 
for a container?  After all Finalize is a subprogram with the object 
being finalized as a parameter.

That, of course, is not the case I really care about.  Often when 
finalizing components , especially those with access discriminants, you 
need to reference the containing object.  The AARM says this is 
intentionally supported:  *Implementation Note: *An implementation has 
to ensure that the storage for an object is not reclaimed when 
references to the object are still possible (unless, of course, the user 
explicitly requests reclamation via an instance of 
Unchecked_Deallocation). This implies, in general, that objects cannot 
be deallocated one by one as they are finalized; a subsequent 
finalization might reference an object that has been finalized, and that 
object had better be in its (well-defined) finalized state.

Should we allow direct references to components of containers when 
passing the container as a parameter will raise Program_Error?  That is 
my problem.  Writing error free finalization code is tough.  Why make it 
tougher?  This is why I said that a class of controlled objects with 
such a flag would be fine, but raising Program_Error when passed as a 
parameter is nasty.  For example, with Pascal's full proposal, I can write:

   begin
      if Some_Container.Finalized
      then null; -- never reached
      else Do_Something;
      end if;
   exception
      when Program_Error => Do_Something_Else;
   end;

But note that I can't wrap this in a procedure.  Well I can, but I can't 
have a the container as a parameter.  That is what bothers me about 
Pascal's proposal.  Figuring out how to write finalization code for 
types which need some finalization operations to be done after the 
components are finalized is just one case I have run into.  (Yes it can 
be done, and it isn't all that bad.  But you end up finalizing the 
components inside the finalization routine for the parent, then letting 
the 'normal' finalization call later do nothing.  This is why I worry 
about what happens in the multiple finalization case.  I may have to dig 
up a mailbox implementation where I did just this, and in fact needed a 
flag in the components to keep the (real) finalization from doing things 
twice.

Randy Brukardt wrote:

>Actually, it's quite easy to create such cases, although the code is pretty
>unlikely in practice. (OTOH, cases like it occurred in early versions of
>Claw, so it's not impossible for them to happen in real code.) AI-280
>discusses some related cases (that AI fixed several bugs in this area) and
>has an example. Typically, the access occurs in the finalization routine of
>some object declared in the same declarative_part. Remember that all of the
>objects of a declarative part are finalized before any of them are destroyed
>(the language is quite clear that the memory isn't deallocated for each
>object individually).

Exactly.  AFAIK, this is only an issue where you have either multiple 
objects being finalized when the parent subprogram is left, or when a 
controlled object has controlled components.  Note that the controlled 
components of a controlled object are finalized in some (unspecifiied) 
order (consistant with the requirement that components with access 
discriminants be finalized first).  I had one case where I added an 
access discriminant just to force the finalization order.  Yes, I could 
have checked whether the other component was already finalized--but that 
would require an access discriminant. ;-)

****************************************************************

From: Pascal Leroy
Date: Thursday, November  9, 2006  1:50 AM

> To implement Pascal's suggestion, all you'd have to do is set 
> one or both of the counters to some nonce value (-1, say) in 
> Finalize.  Other container operations would simply check the 
> counter on entry, and raise PE if the counter were less than 0.

Exactly.

Furthermore, we have to keep in mind that any object may be finalized
multiple times, so the first finalization has to set some piece of
information in the container header so that subsequent finalizations are
just no-ops.  As you point out, this could be done by setting a magic
value in the lock field.

****************************************************************

From: Pascal Leroy
Date: Thursday, November  9, 2006  2:10 AM

> Actually, it's quite easy to create such cases, although the 
> code is pretty unlikely in practice. (OTOH, cases like it 
> occurred in early versions of Claw, so it's not impossible 
> for them to happen in real code.)

Well, it can happen, and we probably agree that only the mildly insane
would do such a thing on purpose.  Hence the notion that users would love
to hear about that case.

****************************************************************

From: Pascal Leroy
Date: Thursday, November  9, 2006  3:30 AM

> Should a second call to Finalize, raise Program_Error 
> for a container?  After all Finalize is a subprogram with the object 
> being finalized as a parameter.

I am interested in calls to subprograms exported by the container
packages, and Finalize is not one of them (heck, the container may not
even be a controlled type, it might just be "magic").  If an
implementation raises P_E when Finalize is called twice, it just has a bug
because we don't allow exceptions to fly out of random places. 

> Should we allow direct references to components of containers when 
> passing the container as a parameter will raise 
> Program_Error?  That is 
> my problem.  Writing error free finalization code is tough.  
> Why make it 
> tougher?  This is why I said that a class of controlled objects with 
> such a flag would be fine, but raising Program_Error when passed as a 
> parameter is nasty.  For example, with Pascal's full 
> proposal, I can write:
> 
>    begin
>       if Some_Container.Finalized
>       then null; -- never reached
>       else Do_Something;
>       end if;
>    exception
>       when Program_Error => Do_Something_Else;
>    end;

This doesn't make any sense to me.  Containers are private types, so they
don't export components.  Therefore, user code cannot do anything like the
above.  Now if that code is in the body of a container package you can
obviously access the Finalized component, and if you do that impropely you
might have a bug.  Shrug.

****************************************************************

From: Robert Dewar
Date: Thursday, November  9, 2006  5:59 AM

> But I just realized that raising Program_Error is silly, and  thus falls 
> under the Dewar rule.  Right now there are cases where an object gets 
> finalized twice.  Should a second call to Finalize, raise Program_Error 
> for a container?  After all Finalize is a subprogram with the object 
> being finalized as a parameter.

No, a second finalize call should be ignored

****************************************************************

From: Robert I. Eachus
Date: Thursday, November  9, 2006  10:44 AM

>Furthermore, we have to keep in mind that any object may be finalized
>multiple times, so the first finalization has to set some piece of
>information in the container header so that subsequent finalizations are
>just no-ops.  As you point out, this could be done by setting a magic
>value in the lock field.

Pascal and I apparently crossed in the mail, but I see now that he is 
only proposing that operations in the container packages raise 
Program_Error, with a finalized parameter, not all subprograms.

However, I think my point still stands.  The extra code to raise 
Program_Error will only have an effect inside Finalization routines.  
There will be cases where it calling operations on containers makes 
sense in (user) finalization routines, usually on user defined 
components of containers, probably with access discriminants.  (There is 
also the case of someone calling Unchecked_Deallocation on a container, 
or more realistically on an object that contains a container.  But I 
think if that happens outside the container package bodies, anything goes.)

****************************************************************

From: Randy Brukardt
Date: Thursday, December 14, 2006  12:31 AM

Ages ago, Matt Heaney wrote:

> I am curious, though, about what motivated Pascal's suggestion: under what
> circumstances would someone be able to refer to an object that has been
> finalized?  If finalization happens immediately prior to the object being
> destroyed, then how can someone refer to the object at all?

I was working on the minutes for that AI that came out of this discussion,
and I remembered that I wasn't convinced that it could actually happen for a
container object (because you can't control what happens inside the
container). So I spent a bit of time trying to figure out how it could
happen. It turns out to be more likely than you might expect.

Access to a finalized container can happen if a Finalize routine accesses a
container object declared in the same master as the Finalize routine. How
could that happen in real code?

Imagine for a moment that someone needs to instrument a controlled type to
collect some usage information. The program might be structured like:

    package Pack is
        type T is new Ada.Finalization.Controlled with ...
        overriding
        procedure Finalize (Object : in out T);

        type Acc_T is access all T'Class;
        A_List : Acc_T;
    end Pack;

    package body Pack is
        package Usage_History is new Ada.Containers.Vectors (...);
        Finalzation_Usage: Usage_History.Vector;

        procedure Finalize (Object : in out T) is
        begin
            -- operations on Finalization_Usage to collect data on the
            -- finalization of objects of T.
        end Finalize;
    end Pack;

When the program finalizes, any allocated objects remaining on A_List will
be finalized when the finalization of the collection for type Acc_T is
occurs. However, since finalization occurs in reverse order of declaration,
the object Finalization_Usage will be finalized before the collection for
Acc_T. So those finalizations will be accessing an object that is already
finalized. ["Finalization of the collection" for an access type is defined
in 7.6.1(11/2).]

Note that it doesn't matter where A_List is declared; it only matters where
the access type is declared relative to the container object. It's not
unusual for that type to be declared visibly (this program structure is
taken from the CLAW Windows interface library), and good Ada practice is to
hide things in the body when possible. But doing so can easily cause
finalization anomolies (some of the cases in AI95-00280 were originally
encountered in CLAW).

This argues strongly for at *least* the resolution decided on by the ARG in
ABQ: Program_Error for tampering operations, bounded error (Program_Error or
acts as empty) for operations that just read from the container.

I've added this example to the otherwise empty AI, so that Tucker doesn't
have to remember to add it when he writes it up 90 minutes before the next
ARG meeting starts. ;-)

****************************************************************


Questions? Ask the ACAA Technical Agent