!standard 6.04(00) 03-12-05 AC95-00084/01 !class amendment 03-12-04 !status received no action 03-12-04 !status received 03-10-14 !subject Performance of unconstrained function calls !appendix From: Martin Dowie Sent: Tuesday, October 14, 2003 5:20 AM Functions that return unconstrained results incur overhead as the result is copied back. An option is to convert the call into a procedure and get the benefit of the pass-by reference mechanism of an 'out' parameter (ok, the RM doesn't force this, but I've yet to see an implementation that doesn't do 'the right thing' :-). The pain with converting to use a procedure is that often the function is an infix operator: declare Result : constant UC_Array := X + Y; -- Nice, but slow begin ... end; declare Result : UC_Array (Some_Constraint) begin Result := Add (X, Y); -- Ugly, but fast ... end; I'm wondering if it would be possible to create an extension to 'renames' that would allow 'Nice and fast', e.g. package Test_Renames is type UC_Array is array (Something range <>) of Whatever; procedure Foo (A, B : UC_Array; C : out UC_Array); function "+" (A, B : UC_Array) return UC_Array renames procedure Foo; -- Tells the compiler that we want the infix notation -- but make the passing mechanism that of the procedure -- -- Parameter profiles would obviously have to match end Test_Renames; or even package Test_Renames is type UC_Array is array (Something range <>) of Whatever; function "+" (A, B : UC_Array) return UC_Array; private procedure Foo (A, B : UC_Array; C : out UC_Array); function "+" (A, B : UC_Array) return UC_Array renames procedure Foo; end Test_Renames; I've introduced the phrase 'renames procedure' to make it clear the intent of the renames but I have no idea if this would actually be necessary (and clearly there is no need for a 'renames function'!) To illustrate the actual performance difference, I did some measurements, where the only difference was the use of a function or a procedure and passing in a pair of 2x2 (constrained) arrays and returning a 2x2 (constrained) array. GNAT 3.15p | No optimization | Max Optimization ---------------------+-----------------+----------------- Functions | 1.373s | 0.634s Procedures | 0.813s | 0.431s ---------------------+-----------------+----------------- Improvement | +40.8% | +32.0% ObjectAda 7.2.2(U10) | No optimization | Max Optimization ---------------------+-----------------+----------------- Functions | 1.531s | 1.281s Procedures | 0.688s | 0.328s ---------------------+-----------------+----------------- Improvement | +55.1% | +74.4% I'm guess this may come under 'Low priority' and 'Hard to implement' but thoughts and brick-bats welcome! :-) **************************************************************** From: Marius Alado Alves Sent: Tuesday, October 14, 2003 5:55 AM > . . . > declare > Result : UC_Array (Some_Constraint) > begin > Result := Add (X, Y); -- Ugly, but fast You meant Add (X, Y, Result); right? > . . . > I'm guess this may come under 'Low priority' and 'Hard to > implement' but thoughts and brick-bats welcome! :-) The problem I see is determining the right constraint for Result. Normally this depends on the constraints of the operands. If I understand correctly, you expect the compiler to know this. But at least in the case of complex, non-linear dependencies, I can't see how he can. For example suppose a set type constrained by cardinality, with "+" for union. How could the compiler "do the right thing" here? **************************************************************** From: Martin Dowie Sent: Tuesday, October 14, 2003 6:09 AM > > . . . > > declare > > Result : UC_Array (Some_Constraint) > > begin > > Result := Add (X, Y); -- Ugly, but fast > > You meant > Add (X, Y, Result); > right? Oops - yes > > . . . > > I'm guess this may come under 'Low priority' and 'Hard to > > implement' but thoughts and brick-bats welcome! :-) > > The problem I see is determining the right constraint for Result. > Normally this depends on the constraints of the operands. If I > understand correctly, you expect the compiler to know this. > But at least > in the case of complex, non-linear dependencies, I can't see > how he can. > For example suppose a set type constrained by cardinality, > with "+" for > union. How could the compiler "do the right thing" here? I'd settle for: declare Result : constant UC_Array (Some_Constraint) := X + Y; begin ... end; **************************************************************** From: Martin Dowie Sent: Tuesday, October 14, 2003 6:22 AM I forgot to mention that the other benefit is that a function allows the result to be declared as a constant, whereas a procedure doesn't and I always like that extra little bit of safety (and self documentation)... **************************************************************** From: Nick Roberts Sent: Tuesday, October 14, 2003 12:25 PM > Functions that return unconstrained results incur overhead > as the result is copied back. An option is to convert the > call into a procedure and get the benefit of the pass-by > reference mechanism of an 'out' parameter (ok, the RM > doesn't force this, but I've yet to see an implementation > that doesn't do 'the right thing' :-). What do you think the 'right thing' is, Martin? In my own compiler (still under construction, I'm afraid) I plan to use three techniques to ensure that there is no significant inefficiency associated with the passing back of results of an unconstrained type from a function. (1) For all functions which return a result in memory (rather than registers), a pointer (far, on the IA-32) to the destination in memory is passed into the function as an invisible parameter. The calling code allocates appropriate space (which may be in a stack or heap), and then passes a pointer to this space. If the size of the return type can vary by no more than n bytes, then space is allocated for its maximum size. Up to n bytes are thus (acceptably) wasted. I might provide a pragma to set n; the default value might be 20. (2) Otherwise, two implementations of the function are generated. One implementation has constraints for the return type passed in as invisible parameters, in addition to the destination pointer. This implementation is used for all calls of the function where the expected subtype is constrained; the constraints are used by the calling code to allocate the appropriate amount of space. Within the function, the constraints computed for the return value are checked against those passed in (and Constraint_Error raised if the check fails). The function must perform the check before attempting to update (any part of) the return value, but this is easy. (3) The other implementation uses a secondary stack to pass the result back. Neither constraints nor destination pointer are passed in; the function itself allocates appropriate space on the secondary stack, and passes out the destination pointer. All tasks which (could possibly) call this kind of implementation must have two stacks (which is a bit of a headache for me, on the IA-32, since I'm finding address space very tight). I might provide a pragma enabling the case (1) implementation to be specified for a function (regardless of return type size variance). If the Access attribute is applied to a function to which case (1) does not apply, the result points to the case (3) implementation (and all call code calling indirectly through the value must work accordingly). The same principle applies to dispatching. > The pain with converting to use a procedure is that often > the function is an infix operator: > > Result : constant UC_Array := X + Y; -- Nice, but slow > begin > > Result : UC_Array (Some_Constraint) > begin > Result := Add (X, Y); -- Ugly, but fast This is a strange example, under the circumstances, since there is no procedure anywhere. I suppose you meant: Result : UC_Array (Some_Constraint) begin Add(X,Y,Result); -- Ugly, but fast > I'm wondering if it would be possible to create an > extension to 'renames' that would allow 'Nice and > fast', I suggest this is purely an optimisation issue. The transformation using an explicit renaming could and should be done by the compiler internally, where the expected subtype (of the function call) is constrained. See case (2) above. Some implementations may wish to provide a pragma to allow the user to choose which functions to transform, and possibly certain other details (such as a size for the second stack). I think I should point out, just to be explicit, that the transformation is not possible when the expected subtype is unconstrained. > To illustrate the actual performance difference, I > did some measurements, ... I think the results are interesting. **************************************************************** From: Dan Eilers Sent: Tuesday, October 14, 2003 1:19 PM This is already a proposed amendment, AI-318. It is not just an efficiency issue where the return type is limited. ****************************************************************