Version 1.2 of acs/ac-00084.txt

Unformatted version of acs/ac-00084.txt version 1.2
Other versions for file acs/ac-00084.txt

!standard 6.04(00)          03-12-05 AC95-00084/01
!class amendment 03-12-04
!status received no action 03-12-04
!status received 03-10-14
!subject Performance of unconstrained function calls
!summary
!appendix

From: Martin Dowie
Sent: Tuesday, October 14, 2003  5:20 AM

Functions that return unconstrained results incur overhead
as the result is copied back. An option is to convert the
call into a procedure and get the benefit of the pass-by
reference mechanism of an 'out' parameter (ok, the RM doesn't
force this, but I've yet to see an implementation that doesn't
do 'the right thing' :-).

The pain with converting to use a procedure is that often
the function is an infix operator:

   declare
      Result : constant UC_Array := X + Y;  -- Nice, but slow
   begin
      ...
   end;

   declare
      Result : UC_Array (Some_Constraint)
   begin
      Result := Add (X, Y);  -- Ugly, but fast
      ...
   end;

I'm wondering if it would be possible to create an extension
to 'renames' that would allow 'Nice and fast', e.g.

   package Test_Renames is

      type UC_Array is array (Something range <>) of Whatever;

      procedure Foo (A, B : UC_Array; C : out UC_Array);

      function "+" (A, B : UC_Array) return UC_Array
         renames procedure Foo;
      --  Tells the compiler that we want the infix notation
      --  but make the passing mechanism that of the procedure
      --
      --  Parameter profiles would obviously have to match

   end Test_Renames;

or even

   package Test_Renames is

      type UC_Array is array (Something range <>) of Whatever;

      function "+" (A, B : UC_Array) return UC_Array;

   private

      procedure Foo (A, B : UC_Array; C : out UC_Array);

      function "+" (A, B : UC_Array) return UC_Array
         renames procedure Foo;

   end Test_Renames;

I've introduced the phrase 'renames procedure' to make it clear
the intent of the renames but I have no idea if this would
actually be necessary (and clearly there is no need for a
'renames function'!)

To illustrate the actual performance difference, I did some
measurements, where the only difference was the use of
a function or a procedure and passing in a pair of 2x2
(constrained) arrays and returning a 2x2 (constrained) array.


GNAT 3.15p           | No optimization | Max Optimization
---------------------+-----------------+-----------------
Functions            |     1.373s      |    0.634s
Procedures           |     0.813s      |    0.431s
---------------------+-----------------+-----------------
Improvement          |   +40.8%        |  +32.0%


ObjectAda 7.2.2(U10) | No optimization | Max Optimization
---------------------+-----------------+-----------------
Functions            |     1.531s      |    1.281s
Procedures           |     0.688s      |    0.328s
---------------------+-----------------+-----------------
Improvement          |   +55.1%        |  +74.4%


I'm guess this may come under 'Low priority' and 'Hard to
implement' but thoughts and brick-bats welcome! :-)

****************************************************************

From: Marius Alado Alves
Sent: Tuesday, October 14, 2003  5:55 AM

>  . . .
>   declare
>       Result : UC_Array (Some_Constraint)
>    begin
>       Result := Add (X, Y);  -- Ugly, but fast

You meant
        Add (X, Y, Result);
right?

> . . .
> I'm guess this may come under 'Low priority' and 'Hard to
> implement' but thoughts and brick-bats welcome! :-)

The problem I see is determining the right constraint for Result.
Normally this depends on the constraints of the operands. If I
understand correctly, you expect the compiler to know this. But at least
in the case of complex, non-linear dependencies, I can't see how he can.
For example suppose a set type constrained by cardinality, with "+" for
union. How could the compiler "do the right thing" here?

****************************************************************

From: Martin Dowie
Sent: Tuesday, October 14, 2003  6:09 AM

> >  . . .
> >   declare
> >       Result : UC_Array (Some_Constraint)
> >    begin
> >       Result := Add (X, Y);  -- Ugly, but fast
>
> You meant
>         Add (X, Y, Result);
> right?

Oops - yes

> > . . .
> > I'm guess this may come under 'Low priority' and 'Hard to
> > implement' but thoughts and brick-bats welcome! :-)
>
> The problem I see is determining the right constraint for Result.
> Normally this depends on the constraints of the operands. If I
> understand correctly, you expect the compiler to know this.
> But at least
> in the case of complex, non-linear dependencies, I can't see
> how he can.
> For example suppose a set type constrained by cardinality,
> with "+" for
> union. How could the compiler "do the right thing" here?

I'd settle for:

   declare
      Result : constant UC_Array (Some_Constraint) := X + Y;
   begin
      ...
   end;

****************************************************************

From: Martin Dowie
Sent: Tuesday, October 14, 2003  6:22 AM

I forgot to mention that the other benefit is that a function
allows the result to be declared as a constant, whereas a
procedure doesn't and I always like that extra little bit of
safety (and self documentation)...

****************************************************************

From: Nick Roberts
Sent: Tuesday, October 14, 2003 12:25 PM

> Functions that return unconstrained results incur overhead
> as the result is copied back. An option is to convert the
> call into a procedure and get the benefit of the pass-by
> reference mechanism of an 'out' parameter (ok, the RM
> doesn't force this, but I've yet to see an implementation
> that doesn't do 'the right thing' :-).

What do you think the 'right thing' is, Martin?

In my own compiler (still under construction, I'm afraid) I plan to use
three techniques to ensure that there is no significant inefficiency
associated with the passing back of results of an unconstrained type from a
function.

(1) For all functions which return a result in memory (rather than
registers), a pointer (far, on the IA-32) to the destination in memory is
passed into the function as an invisible parameter. The calling code
allocates appropriate space (which may be in a stack or heap), and then
passes a pointer to this space. If the size of the return type can vary by
no more than n bytes, then space is allocated for its maximum size. Up to n
bytes are thus (acceptably) wasted. I might provide a pragma to set n; the
default value might be 20.

(2) Otherwise, two implementations of the function are generated. One
implementation has constraints for the return type passed in as invisible
parameters, in addition to the destination pointer. This implementation is
used for all calls of the function where the expected subtype is
constrained; the constraints are used by the calling code to allocate the
appropriate amount of space. Within the function, the constraints computed
for the return value are checked against those passed in (and
Constraint_Error raised if the check fails). The function must perform the
check before attempting to update (any part of) the return value, but this
is easy.

(3) The other implementation uses a secondary stack to pass the result back.
Neither constraints nor destination pointer are passed in; the function
itself allocates appropriate space on the secondary stack, and passes out
the destination pointer. All tasks which (could possibly) call this kind of
implementation must have two stacks (which is a bit of a headache for me, on
the IA-32, since I'm finding address space very tight).

I might provide a pragma enabling the case (1) implementation to be
specified for a function (regardless of return type size variance). If the
Access attribute is applied to a function to which case (1) does not apply,
the result points to the case (3) implementation (and all call code calling
indirectly through the value must work accordingly). The same principle
applies to dispatching.

> The pain with converting to use a procedure is that often
> the function is an infix operator:
>
>       Result : constant UC_Array := X + Y;  -- Nice, but slow
>    begin
>
>       Result : UC_Array (Some_Constraint)
>    begin
>       Result := Add (X, Y);  -- Ugly, but fast

This is a strange example, under the circumstances, since there is no
procedure anywhere. I suppose you meant:

      Result : UC_Array (Some_Constraint)
   begin
      Add(X,Y,Result); -- Ugly, but fast

> I'm wondering if it would be possible to create an
> extension to 'renames' that would allow 'Nice and
> fast',

I suggest this is purely an optimisation issue.

The transformation using an explicit renaming could and should be done by
the compiler internally, where the expected subtype (of the function call)
is constrained. See case (2) above. Some implementations may wish to provide
a pragma to allow the user to choose which functions to transform, and
possibly certain other details (such as a size for the second stack).

I think I should point out, just to be explicit, that the transformation is
not possible when the expected subtype is unconstrained.

> To illustrate the actual performance difference, I
> did some measurements, ...

I think the results are interesting.

****************************************************************

From: Dan Eilers
Sent: Tuesday, October 14, 2003  1:19 PM

This is already a proposed amendment, AI-318.
It is not just an efficiency issue where the return type is limited.

****************************************************************


Questions? Ask the ACAA Technical Agent