Version 1.1 of ais/ai-00254.txt

Unformatted version of ais/ai-00254.txt version 1.1
Other versions for file ais/ai-00254.txt

!standard 03.10 (11)          00-12-06 AI95-00254/00
!class amendment 00-12-06
!status received 00-12-06
!priority Low
!difficulty Hard
!subject Downward closures for access to subprogram types
!summary
!problem
!proposal
!discussion
!example
!ACATS test
!appendix

From: Robert A Duff
Sent: Friday, November 03, 2000 7:56 AM

Here's more mailbox stuffing...
I guess it's better than ballot box stuffing.
Not as good as turkey stuffing.

Erhard says:

> We don't have an AI for that. It was a proposal for an Enhancement
> ('Unrestricted_Access) discussed at the Westboro Mtg, 9/99. At the
> time, the argument against it was a technical difficulty in supporting
> it for subprograms within generics. There was supposed to be some old
> MRT documentation, exploring this issue (an LSN ?), in addition to
> Randy's Prototyping report.
> Your action item is to do some archeology to see whether you can
> retrieve and distribute this LSN or whatever. (It was not an action
> item to rethink the issue in the absence of such documentation.)

OK, I dug up the original LSN on the subject of downward closures.

================

!topic LSN on General Access Types
!key LSN-1083 on General Access Types
!reference RM9X-3.10.2;4.0
!reference RM9X-3.10.2(16);4.0
!reference AARM-12.3(12.p,12.q);4.0
!reference LSN-1042 on Accessibility Checks in Generics
!from Bob Duff $Date: 2000/12/07 03:49:45 $ $Revision: 1.1 $
!discussion

Two issues related to access types and the accessibility rules came
up again at the recent DR/XRG meeting in the UK:

    - The rules prevent "downward closures".

    - The rules are applied in an assume-the-worst manner in generic
      bodies.

These issues keep coming up, because they represent clearly-desirable
functionality, but with a substantial implementation cost (at least for
some implementations).  It is difficult to make the trade-off.  We
recommend not supporting downward closures, but solving the generic
problem by going over to run-time accessibility checks in generics.
Below, we discuss the issues, and the reasons for our recommendation.

LSN-1042, published in October, 1992, discusses in general the
relationship between accessibility checks and the generic contract
model.

----------------

Downward Closures:

Ada's access-to-subprogram types represent assignable subprogram
values.  They support the ability to create data structures
containing the identities of subprograms.  They support callbacks,
where one software component declares a subprogram and hands its
identity to another software component, which can then "call back"
the first software component at a later time.

These types do not support downward closures, where the identity of a
more nested subprogram can be passed to a less nested subprogram.
Downward closures can be used, for example, to implement iterators,
like this:

    package P is
        type Integer_Set is private;
        ... -- various operations
        type Integer_Action is access procedure(I: Integer);
        procedure Iterate(S: Integer_Set; Action: Integer_Action);
            -- Call Action on each member of S.
    end P;

    function Count(S: Integer_Set) return Natural is
        Result: Natural := 0;
        procedure Increment(I: Integer) is
        begin
            Result := Result + 1;
        end Increment;
    begin
        Iterate(S, Increment'Access); -- Illegal!
        return Result;
    end Count;

Increment'Access above is illegal, because Increment is more nested
than type Integer_Action.  It would be bad to have to make Increment
global, because that would require making the variable Result global.
It is a bad idea to force the use of global variables in a language
that has tasks -- in this example, Count would not be reentrant if
Result were global.

Note that this situation is typical of iterators and similar things
-- in typical use, the action passed to an iterator will often need
visibility upon a variable declared in the caller of the iterator, in
order to accumulate some sort of result.

The reason for the restriction is that access-to-subprogram types
allow assignment.  At the point where Iterate is called, we don't
know if the body of Iterate is going to save the access value in a
global variable.  We disallow passing the nested Increment procedure
because Iterate *might* save a pointer to it, and call it later,
after Increment (and the variable Result) no longer exist.

Pascal supports downward closures, but it does not support Ada's
callback style, because the identity of a subprogram being passed as
a parameter cannot be copied.  Modula-2 supports the callback style,
but not the downward closure style.  Languages like Lisp support a
much more general form of closure, which is not of interest to us,
since it requires "stack frames" to be heap allocated -- "heap
frames", really, in the general case.  C supports the callback style,
but since there are no nested functions, the question of downward
closures does not arise.  There are no tasks, either, in C, so making
variables global (or C's "static") does not cause as much trouble as
in Ada.  (Supporting threads in C presents some interesting
problems!)

Implementation Issues: In a language (like Ada) with nested
subprograms, each subprogram needs some way to access its environment
-- that is, the variables declared outside it.  There are two common
methods for representing this environment: static links, and
displays.  Most Ada compilers use static links, but some use
displays.  One key difference between the two methods is that static
links have a compile-time-known size (32 bits, on a typical machine),
whereas the size of a display depends on the maximum nesting depth
throughout the partition.

In an implementation with static links, an access-to-subprogram value
is generally represented as a pair of addresses -- the address of the
subprogram's code, and the static link.  However, given the current
Ada 9X rules, an access-to-subprogram type that is declared at
library level does not need the static link -- it can be represented
as just a code address.  This is nice, because it is efficient, and
it allows the representation to easily match C's typical
representation of function pointers.  Access-to-subprogram types
declared one level deeper than library level can also be represented
as just a code address.  However, two or more levels deeper requires
the static link.

In an implementation with displays, given the current rules, an
access-to-subprogram value can be represented as just a code address
-- it is not necessary to attach a display to each access value.
This also has the nice interoperability-with-C property.

It has been suggested that downward closures be supported by allowing
the Unchecked_Access attribute on subprograms.  (It is currently
allowed only on objects.)  The semantics would presumably be that the
access value returned is valid (the designated subprogram can be
called) so long as the designated subprogram (and it's containing
stack frame) still exist.  If it doesn't exist, a dereference would
be erroneous.  We do not recommend this method for these reasons:

    - It would be unsafe -- it is uncomfortable to introduce
      erroneousness to a high-level feature like downward closures.

    - It would require every access-to-subprogram value to carry a
      static link or display, because the nesting level of the
      designated subprogram would no longer be restricted at
      compile time.  This would break the interoperability-with-C
      property, and would be less efficient.  (One could imagine
      an implementation using a different representation when the
      Convention is C, but that adds complexity, and doesn't solve
      the efficiency problem in the Ada-only case.)

There are two ways to support downward closures that we would
recommend, if downward closures are to be supported.  Both ways
recognize the fact that downward closures really need a separate
feature from the callback style of access-to-subprograms:

    - Limited access-to-subprogram types.

    - Access-to-subprogram parameters.

Limited access types were in an earlier version of Ada 9X.  Since
they would be limited, assignment would be disallowed, so
downward-closure-style parameter passing could be allowed, and would
be safe.  The compiler would implement limited access-to-subprogram
types with a static link or display.  Non-limited ones would retain
the interoperability-with-C property.  The syntax would be something
like this:

    type Integer_Action is limited access procedure(I: Integer);
                        -- ^^^^^^^
    procedure Iterate(S: Integer_Set; Action: Integer_Action);
        -- Call Action on each member of S.

Type conversion from limited to non-limited would be illegal.

Access-to-subprogram parameters would be an extension to access
parameters (not to be confused with parameters of a named access
type).  The syntax would be something like this:

    procedure Iterate(S: Integer_Set;
                      Action: access procedure(I: Integer));
        -- Call Action on each member of S.

This is quite similar to the syntax used in Pascal.  As for any access
parameter, the access type would be anonymous.  Parameter type matching
would be based on the designated profile.  Access parameters already use
run-time accessibility checks -- this new kind would do likewise.  Thus,
assignment would be allowed, but the accessibility checks would prevent
dangling pointers.  In our iterator example, if Iterate were to try to
copy Action to a global variable (during the call to Iterate from
Count), then Constraint_Error would be raised, because Count.Increment
is too deeply nested.  The value of an access-to-subprogram parameter
would need to have a static link or display.  Library-level access types
could retain the interoperability-with-C property.

Of the two workable methods outlined above, the access-to-subprogram
parameters method would be somewhat simpler in Reference Manual
terms, because it would be able to piggyback on the existing access
parameters feature for its rules and semantics.  The implementation
complexity seems equivalent.  Efficiency of the limited access types
method would be slightly better, because it would not be necessary to
pass in information needed for the run-time accessibility checks.
Limited access types would be the first elementary limited types;
we would have to reword the parameter passing rules to avoid a
contradictory requirement to pass by copy and by reference.  (It
doesn't really matter which is done, actually.)

For implementations with static links, the implementation of either
method is straightforward.  Such implementations already need to
include a static link in at least some access-to-subprogram values,
and the downward-closure ones would be the same.

However, for implementations with displays, either method is an
implementation burden, because:

    - The current rules do not require carrying the display
      with the access-to-subprogram value, whereas the new
      rules would.

    - The size of the display is not easily known at compile time.
      It's size can be calculated at run time, or a maximum possible
      size can be known if there is a restriction on nesting depth.
      Either way, managing the passing of the unknown size display
      causes complexity.  It is certainly *possible* to implement:
      I know of at least one Pascal compiler that used displays,
      and implemented downward closures.  Gnu C supports nesting
      of functions as an extension, and supports pointers to them.
      And Ada has other features that require the management of
      unknown-sized data structures.  Nonetheless, the implementation
      is not trivial.

It should also be noted that an implementation with displays would be
less efficient than one with static links, because an indirect call
would require copying the entire display.  It's not that big of a
deal, however -- it just means copying several words of memory, and
there would be no *distributed* overhead.

Note also that there are workarounds: Subprograms can be passed as
generic formal parameters, so an iterator can often be implemented as
a generic formal parameter.  However, that prevents recursion, and is
more verbose.  A closure can also be represented as an
access-to-class-wide value.  In the iterator example, whatever state
is needed by the Action procedure would be encapsulated in a type
extension.  However, this method is even more verbose.

One possibility would be to not support downward closures directly as
access-to-subprogram types, but to make the workaround a bit easier.  In
particular, we could allow limited tagged types to be extended in a more
nested place than the parent type.  (The accessibility rule would remain
as is for non-limited type extensions.)  Instead, a rule would be needed
for allocators, similar to the current rule for types with access
discriminants.  This relaxation is possible for limited types, because
they have no assignment, and hence cannot be copied to a place where the
object lives longer than its type.  This would allow the following:

    package P is
        type Integer_Set is private;
        ... -- various operations
        type Closure is tagged limited null record;
        procedure Action(C: Closure; I Integer);
        procedure Iterate(S: Integer_Set; C: Closure);
            -- Call Do_It(Action, M) for each member M of S.
    end P;

    function Count(S: Integer_Set) return Natural is
        type My_Closure_Type is new Closure with null record; -- Illegal!
        Result: Natural := 0;
        procedure Action(C: Closure; I: Integer) is
        begin
            Result := Result + 1;
        end Action;
        My_Closure: My_Closure_Type;
    begin
        Iterate(S, My_Closure);
        return Result;
    end Count;

The above is less verbose than the current workaround, and has the
advantage that My_Closure_Type can be declared where it arguably belongs
-- inside Count.

The implementation could store the static link or display in each object
of a limited type extension that was declared at a more-nested level
than its parent.  Code would be added to each potentially primitive
subprogram of such a type, to move the static link or display from the
parameter object to its proper location.  Note that no change to the
call sites is necessary.  Nonetheless, there is some implementation
burden, because extra compiler work is needed for declaring these
objects, and in the bodies of the potentially primitive subprograms.

(The reason I say "potentially" primitive is that because of renaming,
we don't know in general while compiling a given subprogram that it will
be primitive.  Potentially primitive subprograms are essentially the
ones declared at the same nesting level, and that take a parameter of
the type.)

We (reluctantly) recommend not supporting downward closures, because:

    - An implementation burden exists for display-based implementations.

    - Workarounds exist.

    - It seems like too big a change to make to the language at this
      late date.

I say "reluctantly" recommend, because it is clear to me that the
functionality is desirable, and if we were designing Ada 9X from
scratch, we would support downward closures.

----------------

Generic Bodies:

Currently, accessibility rules are checked in generic bodies in an
assume-the-worst manner (see 3.10.2(16) and 12.3(12.p,12.q)).  This
means that such rules are checked assuming the instantiation will be
more nested than the generic unit.  This is unfortunate, since most
of the time, the instance will be at the same nesting level as the
generic unit; the rules are therefore more pessimistic than is
usually necessary.

In many cases, it is possible to work around the problem by moving
something from the generic body up to the private part.  For example,
suppose P is a subprogram declared in a generic body, and the body
wishes to do P'Access, of an access type declared outside the generic
unit.  This is illegal, but one can move the declaration of the
subprogram to the private part of the generic, and declare a
constant, also in the private part:

    P_Access: constant Global_Access_To_Subprogram_Type := P'Access;

Similarly, type extensions can be declared in the private part of the
generic when they are illegal in the body.

For the Access attribute for objects, a different workaround is
possible -- use Unchecked_Access instead.  Unfortunately, this throws
away all the benefit of the accessibility checks, and can lead to
erroneous execution if one is not careful.

Another case is for type conversion of an access type declared in the
generic unit (spec or body) to a type global to the generic unit.
This is currently illegal.  A possible workaround is to add a
conversion function to the generic body:

    function Convert(Ptr: access ...) return Global_Access_Type is
    begin
        return Global_Access_Type(Ptr);
    end Convert;

Then, if X is of the inner access type, Convert(X) will do a run-time
check, which will fail if and only if the current instance is more
nested than the generic unit.

All of the above workarounds are annoying, because they cause trouble
when converting a package into a generic package -- one has to apply
the above workarounds after having written and tested the non-generic
package.

One possible solution to the above problems is to define some extra
syntax, or a pragma that means "this generic unit will be
instantiated only at the same nesting level".  However, extra syntax
for a small issue seems like too big a change at this point.  And a
pragma with such a heavy semantic effect seems ugly.

Perhaps a better solution would be to say that all accessibility
rules are checked at run time in instance bodies.  (Doing things at
run time is one way of getting around generic contract model
problems in general, and it would work here.)  An implementation that
does macro expansion for generic instantiations could always detect
these "run-time" errors at macro-expansion time.  An implementation
that does code sharing would have to generate code to do the run-time
checks.

One problem with either solution is that there is a substantial
implementation burden for implementations that do universal generic
code sharing.  For such implementions, when a subprogram declared
inside the generic unit is called, it needs to be implicitly passed
an extra parameter that represents the current instance.  This extra
parameter is used by the subprogram to address variables declared in
the generic body.

Since such subprograms have a different calling convention, if one
allows access values pointing at them to escape outside the generic,
then one has a problem.

We recommend using the run-time accessibility checking solution.
However, because of the implementation problem, we recommend using this
solution only for access-to-object types, and not solving the problem
for access-to-subprogram types.  The implementation problem mentioned
above does not exist for access-to-object types.  This solution is not
entirely consistent, but note that it is already the case that
access-to-subprogram types are different: there are no anonymous
access-to-subprogram types, and there is no Unchecked_Access attribute
for these types.

----------------

SUMMARY:

Downward closures can be sensibly supported, and are quite desirable,
but cause implementation difficulties for display-based
implementations.  Therefore, we recommend not solving this problem.

Removing the accessibility restrictions on generic bodies can also be
sensibly supported, and is also desirable, but (for access-to-subprogram
types) presents severe implementation difficulties for implementations
that do universal sharing of generic bodies.  We recommend solving the
problem for access-to-object types by using run-time checks.  We
recommend not solving the problem for access-to-subprogram types.


****************************************************************

From: Robert A Duff
Sent: Friday, November 03, 2000 7:58 AM

I wrote:

> OK, I dug up the original LSN on the subject of downward closures.

Since LSN-1083 refers to LSN-1042, I'll send that along, too, though I'm
not sure how relevant it is.

================

!topic LSN on Accessibility Checks in Generics
!key LSN-1042 on Accessibility Checks in Generics
!reference MS-12;4.6
!from Bob Duff $Date: 2000/12/07 03:49:45 $ $Revision: 1.1 $
!discussion

This Language Study Note discusses accessibility checks, and their
relationship to the Ada 9X version of the generic contract model.

Ada 83 originally intended to support a "contract model" of generics,
in which the generic specification forms a contract between clients of
the generic and the body of the generic.  However, there were certain
holes in the contract model.  The Ada 9X requirements document
directs us to address these problems.  Requirement R4.4-B(1),
"Dependence of Instantiations on Bodies" states that "A user must
have the ability to compile a generic instantiation before compiling
the corresponding generic body."  Study Topic S4.4-B(2), "Tighten the
Contract Model" states that "Ada 9X should ensure that compatibility
of a generic instantiation with the corresponding generic body is
based solely on the compatibility of each with the generic
specification."  Although this is a study topic, we see no way of
achieving R4.4-B(1) without also achieving S4.4-B(2).  Therefore, we
have tried to ensure that there are no holes in the Ada 9X generic
contract model.  We have patched the holes of Ada 83, and avoided
adding new ones related to new Ada 9X features.

We agree with the Requirements document -- it is essential to have an
air-tight contract model.

One might think that the generic formal part would contain all the
information that is necessary to determine whether an instantiation
is legal.  However, we have found that to be infeasible.  Therefore,
we have extended the contract model to include the entire
specification of the generic, including the private part, if any.
This means that compile-time rules are checked in the instance
specification.  On the other hand, compile-time rules are not checked
in the instance body; instead, they are checked in an "assume the
worst" manner in the generic body.  Assume the worst means that we
ensure that ANY instance body will be legal.

In MS;4.6, certain "scope checking" rules were defined.  We are now
calling these "accessibility rules".  These rules are intended to
prevent dangling references.

Here's how the Ada 9X generic contract model relates to the
accessibility rules.  Within a generic body, we must "assume the
worst", which in this case means that we must assume that the generic
will be instantiated at a place that is more nested than the generic
declaration.  This implies, for example, that one cannot declare a
type extension in a generic body.  (Type extensions must be declared
at the same level as the parent type.)

However, within the specification of the generic, we defer the check
until instantiation time, when we know the actual nesting level.  This
means that if a type extension appears in a generic spec, an
instantiation that appears at the same level as the generic declaration
will be legal, whereas a more nested instantiation will be illegal.
(Remember that the "nesting" we're talking about here does not include
nesting within a package.)

Here's an example:

    package Library_Pack_1 is
        generic
            type T is tagged private;
        package G is
            -- Note to clients: You must instantiate this at library
            -- level.
            ...
        private
            type T2 is new T with ...;
        end G;
    end Library_Pack_1;

    package Library_Pack_2 is
        ...
    end Library_Pack_2;

    package body Library_Pack_2 is
        package Instance is new G(...); -- Legal.

        procedure Nested is
            package Bad_Instance is new G(...); -- Illegal!
        begin
            ...
        end Nested;
    end Library_Pack_2;

The programmer really wanted to declare type T2 in the body of G.
However, that would be illegal.  But at least there is a workaround:
T2 can be declared in the private part of G.  This may not be ideal,
but it is the only way we have found to preserve the contract model.
In practice, the restrictions are not a problem, since most
instantiations can be placed at library level without difficulty.  It
might seem more consistent to treat the spec of G in the same way as
the body.  However, that would remove an important set of
capabilities.  Since bodies generally assume the worst, one needs a
way to get around various rules -- that way is to put declarations in
the private part of the generic.

Given the contract model that we have chosen, it would be strange to
do the accessibility checks in a different way.

****************************************************************

From: Randy Brukardt
Sent: Friday, November 03, 2000 11:40 AM

Thanks, Bob, now I don't have to try to find this stuff.

I have one quibble with this LSN:

> One problem with either solution is that there is a substantial
> implementation burden for implementations that do universal generic
> code sharing.  For such implementions, when a subprogram declared
> inside the generic unit is called, it needs to be implicitly passed
> an extra parameter that represents the current instance.  This extra
> parameter is used by the subprogram to address variables declared in
> the generic body.

This really is misleading, in that it identifies the problem as one that
occurs only with universal sharing. If that was the case (and given that RR
is the only compiler that does this), a reasonable result would be to say
"universal sharing isn't worth it, let's have this feature anyway".

However, the problem is actually much broader. It occurs with any
implementation of sharing that doesn't look in the body (i.e. one that
shares based on the contract of the specification). Unless the
implementation knows for sure that there aren't any variables in the body,
it will have to somehow provide a method of instancing those variables. That
means somehow passing an instance pointer into the subprogram (as described
above).

Thus, relaxing this rule makes not only universal sharing impossible, but as
well any version of sharing based only on the contract. Essentially, with a
relaxed rule, sharing is only possible if the body does not contain a
subprogram 'Access or doesn't contain any local objects. (I would expect
that implementations wouldn't bother with both cases if they did any sharing
at all; the second case almost never would happen, given it would require
both no objects and instantiations that are virtually identical.)

This would render moot the discussions about contract enhancing pragmas and
the DEC patent. It also means that an implementation which supported any
kind of sharing could not make a decision about how to generate a generic at
the instantiation site, as the body would control whether or not sharing was
possible. Personally, I think this would prevent any implementations from
bothering with sharing at all.

****************************************************************

From: Dan Eilers
Sent: Friday, November 03, 2000 2:33 PM

Randy wrote:
> This really is misleading, in that it identifies the problem as one that
> occurs only with universal sharing. If that was the case (and given that RR
> is the only compiler that does this), a reasonable result would be to say
> "universal sharing isn't worth it, let's have this feature anyway".

The RR compiler is of course _not_ the only compiler that does universal
sharing of generics.

****************************************************************

From: Dan Eilers
Sent: Friday, November 03, 2000 2:20 PM

Motivation for adding downward closures to Ada includes the bad press Ada
gets for not supporting this programming paradigm, such as the paper:
"Using Procedural Parameters and Continuations in Combinatorial Searches"
in Software - Practice and Experience, Volume 24, 1994, pp 377-386, by
Wen-Ping Hwang and Ching-Lin Wang.

The conclusion of this paper states:

"Although the code is given in Pascal, the technique is applicable in other
imperative languages such as Modula-2, and impurely functional languages
such as ML and Scheme that allow side-effects.  However, it is not
applicable in C, since functions may not be defined within other functions,
and Ada, since recursive instantiations of generic subprograms are disallowed."

****************************************************************

From: Robert A Duff
Sent: Monday, November 06, 2000 8:06 AM

Thanks for the info, Dan.

> The conclusion of this paper states:
>
> "Although the code is given in Pascal, the technique is applicable in other
> imperative languages such as Modula-2, and impurely functional languages
                               ^^^^^^^^
> such as ML and Scheme that allow side-effects.  However, it is not
> applicable in C, since functions may not be defined within other functions,
> and Ada, since recursive instantiations of generic subprograms are disallowed."

The part about Modula-2 sounds fishy to me.  Modula-2 does not have
downward (inward?) closures.  I found this a disappointment when moving
from Pascal to Modula-2, because I had thought that Modula-2 was Wirth's
"Better Pascal" (which in many other respects it is).

Modula-2 has pointer-to-procedure types, but they can't point to nested
procedures.  This is essentially the same restriction we have in Ada
with the acessibility rules: you can't pass a nested procedure as a
parameter to an outer procedure.

I haven't read the paper, so I don't know if their technique requires
downward closures.

****************************************************************

From: Robert Dewar
Sent: Monday, November 06, 2000 9:43 AM

I suppose we could dig up the old proposals in this area. I must say that
if the strongest argument is alledged "bad press" I find that very weak.
What would be convincing would be real customer experience. We have not
had any demands in the GNAT community, but that may be deceptive because
of the availability of Urestricted_Access in GNAT -- which indeed is
very usefiul, and which we use ourselves and find critically important.

****************************************************************


Questions? Ask the ACAA Technical Agent