Version 1.1 of ais/ai-10318.txt

Unformatted version of ais/ai-10318.txt version 1.1
Other versions for file ais/ai-10318.txt

!standard 03.03.01 (02)          04-04-05 AI95-00318-2/02
!standard 06.05.00 (17)
!standard 06.05.00 (18)
!class amendment 02-10-09
!status work item 03-05-23
!status received 02-10-09
!priority Medium
!difficulty Medium
!subject Limited and anonymous access return types
A new extended syntax is proposed for the return statement, providing a name for the new object being created as a result of a call on the function.
This new syntax can be used to support returning limited objects from a function and more generally to reduce the copying that might be required when a function returns a complex object, a controlled object, etc.
The existing ability to return by reference is replaced by an ability to have an anonymous access type as a return type, and allowing implicit dereference of calls on such functions.
We already have a proposal for allowing aggregates of a limited type, by requiring that the aggregate be built directly in the target object. rather than being copied into the target.
But aggregates can only be used with non-private types. Limited private types could not be initializable at their declaration point. It would be natural to allow functions to return limited objects, so long as the object could be built directly in the "target" of the function call, which could be a newly created object being initialized, or simply a parameter to another subprogram call.
When returning a limited type it may be desirable to perform some other initialization to the object after it has been created, but before returning from the function. This is difficult to do while still creating the object directly in its "final" location.
Currently functions that return a limited private type may have an accessibility check performed on the object returned, depending on a property ("return-by-reference-ness") which is not generally visible based on the partial view of the type. This means that a function that works initially may stop working if the full type of the result type is changed to include, say, a limited tagged component, or some other component that is return-by-reference.
A function whose result type turns out to be return-by-reference cannot be allowed where a new object is required. However, there is nothing in the declaration of such a function that indicates it returns by reference.
The capability to return-by-reference could be useful for non-limited types, but it becomes even more useful if a call on such a function could be treated as a variable, so it could be used on the left-hand side of an assignment. An alternative which provided these capabilities without introducing the conceptual oddity of return-by-reference would be welcome. If a function could return an access value which was then implicitly dereferenced at the call site, the effect of return-by-reference could be accomplished, but with a model that is more consistent with other parts of the language.
Anonymous access types are permitted for a function result type:
parameter_and_result_profile ::= [formal_part] return subtype_mark | [formal_part] return access_definition
An anonymous access type used as the result type of a function is called an access result type. The accessibility level of an access result type is that of the declaration containing the parameter_and_result_profile.
A call of a function with an access result type may be implicitly dereferenced in any context where a name is permitted, other than as the operand of a type conversion.
An extended syntax for the return statement is proposed:
RETURN identifier : [ALIASED] return_subtype_indication [:= expression] [DO
end return];
Such an extended return statement is permitted only immediately within a function. The specified identifier names the object that is the result of a call on the function. If the expression is present, it provides the initial value for the result object. If not, the result object is default initialized. If the handled_sequence_of_statements is present, it is executed after initializing the result object. Within the handled_sequence_of_statements, the identifier denotes a variable view of the result object with nominal subtype given by the subtype_indication. When the handled_sequence_of_statements completes, the function is complete.
Note: An expression-less return statement is permitted within the handled_sequence_of_statements, similar to the way that accept statements work.
A call of a function with a limited result type may be used in the same contexts where we have proposed to allow aggregates of a limited type, namely contexts where a new object is being created (or can be).
1) Initializing a newly declared object (including a result object identified
in an extended return statement)
2) Default initialization of a record component 3) Initialized allocator 4) Component of an aggregate 5) IN formal object in a generic instantiation (including as a default) 6) Expression of a return statement 7) IN parameter in a function call (including as a default expression)
In addition, since the result of a function call is a name in Ada 95, the following contexts would be permitted, with the same semantics as creating a new temporary constant object, and then creating a reference to it:
8) Declaring an object that is the renaming of a function call. 9) Use of the function call as a prefix to 'Address
In other words, it would be permitted in any context where limited types are permitted. With the new proposals, that is pretty much any context where a "name" that denotes an object or value is permitted, except as the right hand side of an assignment statement.
Add after 3.10.2(13):
* The accessibility level of the anonymous access type of an access result type (see 6.5) is the same as that of the associated function or access-to-subprogram type.
Modify 4.1(2):
name ::= ... {| implicit_function_call_dereference}
Add after 4.1(6):
implicit_function_call_dereference ::= function_call
The name in an implicit_dereference or an explicit_dereference shall not be an implicit_function_call_dereference.
Modify 4.1(8) as follows:
The name {or function_call} in a dereference ([either] an implicit_dereference[ or]{,} an explicit_dereference{, or an implicit_function_call_dereference}) is expected to be of any access type. {The function_name or function_prefix of the function_call of an implicit_function_call_dereference shall denote a function with an anonymous access result type.}
AARM Note:
We don't allow a dereference of an implicit dereference. We provide implicit dereferences in a prefix, or as the result of calling a function with an anonymous access type as result type.
Modify 4.1(13):
The evaluation of a dereference consists of the evaluation of the name {or function_call} and the determination of the object or subprogram that is designated by the value of the name {or function_call}. A check is made that [the value of the name] {this value} is not the null access value. Constraint_Error is raised if this check fails. The dereference denotes the object or subprogram denoted by [the value of the name] {this value}.
Add after 4.6(7):
The operand of a conversion shall not be an implicit_function_call_dereference.
Change 6.1(13) to:
parameter_and_result_profile ::= [formal_part] return subtype_mark | [formal_part] return access_definition
Modify 6.3.1(16) as follows:
Two profiles are mode conformant if they are type-conformant, corresponding parameters have identical modes, and, for access parameters {or access result types}, the designated subtypes statically match.
Replace clause 6.5 with the following:
6.5 Return Statements
A return_statement is used to complete the execution of the innermost enclosing subprogram_body, entry_body, or accept_statement.
return_statement ::= simple_return_statement | extended_return_statement
simple_return_statement ::= return [expression];
extended_return_statement ::= return identifier : [aliased] return_subtype_indication [:= expression] [do handled_sequence_of_statements end return];
return_subtype_indication ::= subtype_indication | access_definition
Name Resolution Rules
The result subtype of a function is the subtype denoted by the subtype_mark, or defined by the access_definition, after the reserved word RETURN in the profile of the function. The expression, if any, of a return_statement is called the return expression. The expected type for a return expression is the result type of the corresponding function.
Legality Rules
If the result subtype of a function is limited at the point where the function is frozen (see 13.14), the result subtype shall be constrained.
This second rule is not a necessary restriction, but simplifies implementation dramatically, since it means the caller can allocate space for the result object, perform all "implicit" initializations of task and protected components, worry about accessibility levels for access discriminants, etc. The restriction could be lifted in Ada 201Z.
AARM Ramification:
Note that this rule is defined at the point where a function is frozen rather than at the point of the function declaration, to ensure we are talking about the type characteristics visible inside the enclosing package, rather than the characteristics visible to the caller. Of course compilers are encouraged to signal the error as soon as possible.
A return_statement shall be within a callable construct, and it applies to the innermost callable construct or extended_return_statement that contains it. A return_statement shall not be within a body that is within the construct to which the return_statement applies.
A function body shall contain at least one return_statement that applies to the function body, unless the function contains code_statements. A simple_return_statement shall include a return expression if and only if it applies to a function body. An extended_return_statement shall apply to a function body.
If the result subtype of a function is defined by a subtype_mark, the return_subtype_indication of an extended_return_statement that applies to the function body shall be a subtype_indication. The type of the subtype_indication shall be the result type of the function. If the result subtype of the function is constrained, then the subtype defined by the subtype_indication shall also be constrained and shall statically match this result subtype. If the result subtype of the function is unconstrained, then the subtype defined by the subtype_indication shall be a definite subtype, or there shall be a return expression.
If the result subtype of the function is defined by an access_definition, the return_subtype_indication shall be an access_definition. The subtype defined by the access_definition shall statically match the result subtype of the function. The accessibility level of this anonymous access subtype is that of the result subtype.
If the type of the return expression is limited, then the return expression shall be an aggregate, a function call (or equivalent use of an operator), or a qualified_expression or parenthesized expression whose operand is one of these.
AARM Note:
In other words, if limited, the return expression must produce a "new" object, rather than being the name of a preexisting object (which would imply copying).
Static Semantics
Within an extended_return_statement, the return object is declared with the given identifier, with nominal subtype defined by the return_subtype_indication.
Dynamic Semantics
For the execution of an extended_return_statement, the subtype_indication is elaborated. This creates the nominal subtype of the return object. If there is a return expression, it is evaluated and converted to the nominal subtype (which might raise Constraint_Error -- see 4.6) and becomes the initial value of the return object; otherwise, the return object is initialized by default as for a stand-alone object of its nominal subtype (see 3.3.1). If the nominal subtype is indefinite, the return object is constrained by its initial value. The handled sequence of statements, if any, is then executed.
For the execution of a simple_return_statement, the expression (if any) is first evaluated and converted to the result subtype to become the value of the anonymous return object.
If the result type of a function is a specific tagged type, the tag of the return object is that of the result type.
AARM Ramification:
This is true even if the tag of the return expression is different, which could happen if the return expression were a view conversion or a dereference of an access value. Note that for a limited type, because of the restriction to aggregates and function calls (and no conversions), the tag will already match.
AARM Reason:
This rule ensures that a function whose result type is a specific tagged type always returns an object whose tag is that of the result type. This is important for dispatching on controlling result, and allows the caller to allocate the appropriate amount of space to hold the value being returned (assuming there are no discriminants).
Finally, a transfer of control is performed which completes the execution of the construct to which the return_statement applies, and returns to the caller. In the case of a function, the function_call denotes a constant view of the return object.
Examples of return statements:
return; -- in a procedure body, entry_body, -- accept_statement, or extended_return_statement
return Key_Value(Last_Index); -- in a function body
return Node : Cell do -- in a function body, see 3.10.1 for Cell Node.Value := Result; Node.Succ := Next_Mode; end return;
Add after 8.1(4):
* an extended_return_statement;
Here is an example of a function with a limited result type using an extended return statement:
function Make_Obj(Param : Natural) return Lim_Type is begin return Result : Lim_Type do -- the "return" object -- Finish the initialization of the "return" object. Further_Processing(Result, Param); end return; end Make_Obj;
Here is a similar function that returns an access-to-limited type:
function Make_Obj(Param : Natural) return access Lim_Type is begin return Result : access Lim_Type do -- The "return" object Result := new Lim_Type; -- storage pool associated with scope where -- function declared Further_Processing(Result.all, Param); end return; end Make_Obj;
Here is an abstraction which takes advantage of the implicit dereference of functions with access result types, to support an extensible array abstraction (aka vector):
generic type Element is private; type Index is (<>); package Extensible_Arrays is pragma Assert(Index'First > Index'Base'First); -- so can have empty arrays
type Ext_Array is private; -- Extensible array, initially Last(EA) = Index'First-1
procedure Set_Elem(EA : in out Ext_Array; I : Index; Elem : Element); -- Set element, extend array if necessary -- Postcondition: Last(EA) >= I
function Last(EA : Ext_Array) return Index'Base; -- Returns index of current last element of array
function Elem(EA : Ext_Array; I : Index) return access Element; -- Refer to existing element -- Precondition: I in Index'First .. Last(EA) -- Result can be implicitly dereferenced.
procedure Set_Empty(EA : in out Ext_Array); -- Set array back to empty -- Postcondition: Last(EA) = Index'First - 1 private type Elem_Array is array(Index range <>) of aliased Element; -- We define an array-of-aliased so can implement "Elem"
type Elem_Array_Ptr is access Elem_Array; -- We want a named access type so can use unchecked deallocation
type Ext_Array is record Last : Index'Base := Index'First - 1; Data : Elem_Array_Ptr; -- This is reallocated as necessary to accommodate at least -- Index'First .. Last elements end record; end Extensible_Arrays;
procedure Ext_Array_Test(Max : Positive) is package Ext_Int_Arrays is new Extensible_Arrays(Element => Integer; Index => Positive); type Ext_Int_Array is new Ext_Int_Arrays.Ext_Array;
X : Ext_Int_Array; -- Initially empty begin -- Initialize table of squares, extending as necessary for I in 1..Max loop Set_Elem(X, I, Elem => I*2); end loop;
-- Add one to each of the elements with indices up to Max/2 for I in 1..Max/2 loop Elem(X, I) := Elem(X, I) + 1; -- Using implicit deref end loop;
-- Now print out the table for I in 1..Last(X) loop Ada.Text_IO.Put_Line(Integer'Image(I) & " => " & Integer'Image(Elem(X, I))); -- Again using implicit deref end loop;
Set_Empty(X); -- All done end Ext_Array_Test;
In meetings with Ada users, there has been a general sense that if limited aggregates are provided in Ada 200Y, it would be desirable to also provide limited function returns which could act as "constructor" functions.
Just allowing a function whose whole body is a return statement returning an aggregate (or another function call) does not give the programmer much flexibility. What they would like is to be able to create the object being returned and then initialize it further somehow, perhaps by calling a procedure, doing a loop (as in the examples above), etc. This requires a named object. However, to avoid copying, we need this object to be created in its final "resting place," i.e. in the target of the function call. This might be in the "middle" of some enclosing composite object the caller is initializing, or it might be in the heap, or it might be a stand-alone local object.
Because the implementation needs to create the result object in a place determined by the caller, it is important that the declaration of the object be distinguished in some way. By declaring it as part of an extended return statement, we have a way for the programmer to indicate that this is the object to be returned. Clearly we don't want to allow extended return statements to be nested.
Because it may be necessary to do some computing before deciding exactly how the result object should be declared, we permit the extended return statement to occur any place a normal return statement is permitted. So different branches of an if or case statement could have their own extended return statements, each with its own named result object.
Note that we have allowed the user to declare the result object as "aliased." This seems like a natural thing which might be wanted, so you could initialize a circularly-linked list header to point at itself, etc.
Note that we had discussed various mechanisms where information from the calling context would be available inside the function at the language level. In particular, it would be possible to refer to the values of the discriminants or bounds of the object being initialized, presuming it was constrained, within the subtype indication and initializing expression, if any.
Ultimately this capability was not included in this proposal, as it created a series of somewhat complicated restrictions on usage and made the implementation that much more difficult. Note that the implementation may still need to pass in information from the calling context, depending on the run-time model, because if the type is "really" limited (e.g. it is limited tagged, or contains a task or a protected object), then the new object must be built in its final resting place. In many run-time models, that means the storage needs to be allocated at the call-site if the object being initialized is a component of some larger object.
However, by not allowing the programmer to refer to this contextual information at the langauge level, we give the implementation more flexibility in how it solves the build-in-place requirement for "really" limited objects. See the discussion below about implementation approaches.
The proposed syntax for extended return statements was discussed a year or so ago, but when this AI was first written up, we proposed instead a revised object declaration syntax where the word "return" was used almost like the word "constant," as a qualifier. This was somewhat more economical in terms of syntax and indenting, but was not felt to be as clear semantically as this current syntax.
We have eliminated the capability for returning by reference, in favor of returning a value of an anonymous access type, coupled with implicit dereference of a call on such a function. An alternative proposal (AI-318-1) proposed to make return-by-reference a separate capability, triggered by the presence of the reserved word "ALIASED" in the function profile. This was felt by some reviewers to be enshrining the confusing notion of return-by-reference, which earlier had been buried in a discussion of certain limited types. Furthermore, the implementation model of return by reference was clearly to return a "reference" (effectively an access value) to the result object. Making this explicit presumably makes the feature easier to understand, and we can also piggy back on the usual accessibility checks, rather than have to invent special ones associated with a return by reference.
The capability to return an anonymous access type goes well with the other changes allowing anonymous access types in more contexts. We have kept the implementation simple by making the accessibility level of the result type the same as that of the associated function (or access-to-subprogram type).
The implementation of the extended return statement for non-limited types should minimize the number of copies, but may still require a copy in some implementation models and in some calling contexts.
The implementation of the extended return statement for limited result types is straightforward if the result subtype is constrained. It is essentially equivalent to a procedure with an OUT parameter -- the caller allocates space for the target object, perhaps does some of the "implicit" initialization for tags, discriminants, tasks, or protected components, etc., and passes its address to the called routine, which uses it for the "return" object. Nonlimited controlled components can still require some fancy footwork, since they can be explicitly initialized, so default initializing them would be inappropriate. But compilers already have to deal with returning non-limited controlled objects, so presumably this won't create an insurmountable burden.
If a limited result subtype could be unconstrained, the implementation might be significantly more complex. The details are discussed in AI-318-1. We do not allow this in this proposal (nor in that one, for that matter). Note that by disallowing unconstrained result types, we also eliminate issues relating to access discriminants of limited types, which have special accessibility checking issues.
Supporting a function result of an anonymous access type presents no special challenges since we have defined the accessibility level of the result type to be the same as that as the associated function or access-to-subprogram declaration. Hence, it is as though a named access type were declared and then used as the result type, from a run-time model point of view. There is no need for any (new) run-time accessibility checking.
Supporting implicit dereference of functions with an anonymous access result type will require some work in the overloading phase, but anonymous access types are already implicitly convertible to various named access types, so this is essentially adding implicit "convertibility" to the designated type.
There was some concern about what would happen if an exception were propagated by an extended return statement, and then the same or some other extended return statement were reentered. There doesn't seem to be a real problem. The return object doesn't really exist outside the function until the function returns, so it can be restored to its initial state on call of the function if an exception is propagated from an extended return statement. Once restored to its initial state, there seems no harm in starting over in another extended_return_statement.
!ACATS test
ACATS(s) tests need to be created for these features.

From: Tucker Taft
Sent: Thursday, April  1, 2004,  6:13 AM

I have been asked to prepare an alternative to AI-318
which drops the notion of "aliased" return-by-reference
functions, and replaces it with a simplified version
of anonymous access type return.  One thing that is
being lost in this process is that return-by-reference
eliminates the need for ".all" at the call site.
However, it struck me that we already allow implicit
dereference in a number of contexts, and since
anonymous access types as return types is a new
feature, it would be feasible to allow implicit
dereference of calls of such functions in *any* context.

Allowing implicit dereference has some advantages:

   1) It provides better compatibility with the existing
      (albeit limited) return-by-reference capability,
      because call sites would not have to change, only
      the function would change to return X'access rather
      than X (or Y rather than Y.all).  Implicit dereference
      would eliminate the need for a .all at the call sites.

   2) C++ has a return-by-reference capability ("&" return type)
      which allows a natural way to use a call on a function as
      the left hand side of an assignment, allowing the implementation
      of "abstract" arrays, e.g.:
         Arr(X) := Arr(X) + 1;
      where "Arr" is actually a function that implements an array-like
      data structure.

      We could get much of this same capability by allowing
      functions declared to return an anonymous access type to
      be implicitly dereferenced in any context.  Furthermore,
      since Ada uses "()" for both array indexing and function
      calling, this would actually get some value out of that
      syntactic unification (or as Robert might call it,
      "confusion" ;-).

      This is actually better than the "aliased" return-by-ref
      capability, since in that case the returned object was
      necessarily considered a constant.  Of course if the
      writer of the function wanted the result to be access-to-constant,
      they could declare it that way.

   3) Similar to above, but relevant to me because Bob Duff and I
      have recently been sparring over an issue that would be nicely
      solved by implicit dereference:  As in many text- and language-
      processing tools, we convert all strings into unique IDs as soon
      as we read the source file.  We call these unique IDs "spellings,"
      LISP used to call them "symbols," and I have seen them called
      String-IDs and a number of other similar things.  They
      significantly simplify further processing because string equality
      involves a simple ID equality comparison, and these IDs can
      be efficiently passed and returned from subprograms without
      any of the issues associated with passing and returning
      unconstrained arrays.

      *However*, when it comes to passing these IDs to
      subprograms that expect Strings, we have to convert
      the ID back to a String.  The simplest way to do this is to
      write a function, say To_String, which takes an ID and returns
      a String.  Unfortunately, that immediately gets you back into
      the inefficiencies of returning unconstrained arrays.  An
      alternative is to expose the representation of the IDs, and
      allow the caller to explicitly use ".all" or a component selection
      to retrieve the String at the call site, but that clearly makes
      the "abstraction" a bit less abstract.

      By allowing implicit dereference of functions returning
      anonymous access types, we could have the best of both worlds.
      The To_String function could actually return "access constant
      String" instead of String, but it could still be used in
      any context that required a String without the overhead of
      returning unconstrained arrays.  This would preserve both
      abstraction and performance.

So, barring major objection, I am going to propose that calls
on functions returning anonymous access types will permit
implicit dereference in any context (instead of only in front
of ".", "(", and "'").

Comments welcomed...


From: Pascal Leroy
Sent: Monday, April  5, 2004,  10:47 AM

Tuck wrote:

> Here is an alternative proposal, which drops
> "aliased return blah" (return-by-reference) in
> favor of "return access blah."  It still includes
> functions returning limited types.

It took me a while to realize that this AI really has two proposals:

1 - Functions returning anonymous access types.  That includes implicit
dereferencing, but as I see it the extended_return_statement is not
necessary for this part.

2 - Improvements for functions returning limited types.  This is the
part that really needs the extended_return_statement.

The more I look at the AI, the more I like #1 (especially with implicit
dereferencing and the capability to have a function call on the LHS of
an assignment) and the less convinced I am about #2.  Yeah, it would be
nice to improve the usability of limited types, but the baggage needed
to do that (and the somewhat arbitrary restrictions that come with it)
sounds clunky to me.

What do others think?


From: Tucker Taft
Sent: Monday, April  5, 2004,  11:13 AM

Pascal Leroy wrote:
> Tuck wrote:
> > Here is an alternative proposal, which drops
> > "aliased return blah" (return-by-reference) in
> > favor of "return access blah."  It still includes
> > functions returning limited types.
> It took me a while to realize that this AI really has two proposals:
> 1 - Functions returning anonymous access types.  That includes implicit
> dereferencing, but as I see it the extended_return_statement is not
> necessary for this part.
> 2 - Improvements for functions returning limited types.  This is the
> part that really needs the extended_return_statement.

I believe I was directed to keep these two proposals as part of a single AI.

> The more I look at the AI, the more I like #1 (especially with implicit
> dereferencing and the capability to have a function call on the LHS of
> an assignment) and the less convinced I am about #2.  Yeah, it would be
> nice to improve the usability of limited types, but the baggage needed
> to do that (and the somewhat arbitrary restrictions that come with it)
> sounds clunky to me.
> What do others think?

I think this is the key thing to make limited types more useful.
With this change, making a type limited allows the implementor to
control all cases of copying, without dramatically undermining
the usability of the type, and with almost no negative performance


From: Randy Brukardt
Sent: Tuesday, April  6, 2004,  3:53 PM

The only reason that I would ever vote for (1) would be if it was the only
way to get (2). If we don't want to handle the limited functions, then we
need do nothing for return-by-reference.

Visible access parameters and results in modern programs should be
discouraged; used only when there is absolutely no other choice. (If we'd
have "in out" on functions, there would never be a need for them.)

As far as the implicit dereference goes, I've been waiting for the expected
"April Fool" that goes with it. Since I've been waiting a week, I suppose it
is actually a serious proposal. I find it completely bizarre, because it
ruins the model of implicit dereference (it occurs only before '.' or '()').
Moreover, why

     type Int_Access is access all Integer;
     function anon return access all Integer;
     function IA return Int_Access;

   I := Anon; -- Legal.
   I := IA; -- Illegal.

should behave differently is going to be just too goofy to explain.

OTOH, (2) will not only eliminate arbitrary restrictions from limited types,
but it also will make code more readable anytime that it takes multiple
steps to create a result. (And should allow the generation of better code as
well by building in place more often.)


Questions? Ask the ACAA Technical Agent