Version 1.1 of ai12s/ai12-0197-1.txt
!standard 6.9(0) 15-06-03 AI12-0161-1/01
!class Amendment 16-06-07
!status work item 16-06-07
!status received 16-06-07
!priority Low
!difficulty Hard
!subject Generator Functions
!problem
The expression of computation often works on streams of data. It is often
necessary to express computation of streams as incremental routines, working on
chunks of data. But expressing algorithms in an incremental fashion is difficult:
It implies manually keeping and maintaining the state of your algorithm, saving
it on exit, and restoring it on re-entry.
Generators, often called resumable functions, or Iterators in C#, offer a way
to write such incremental code in a straightforward procedural way, by giving
you the possibility to resume a function after it has returned a result.
This resumable return capability is generally called "yield" in other
languages, so we will use the same terminology in this AI.
!proposal
We propose introducing a new kind of subprogram, called a generator. This new
kind of subprogram would not be instantly called, but instead return a
generator object. This generator object would be iterable, automatically (via a
for-loop iteration), or manually, via the use of built-in attributes on the
generator object.
A subprogram of the generator kind would be subject to the following constraints:
1. Use a yield clause instead of a return clause. The subprogram_specification
rule would then become:
subprogram_specification ::=
( "procedure" defining_program_unit_name [ formal_part ] )
| ( "function" defining_designator [ formal_part ] "return" subtype_mark )
| ( "function" defining_designator [ formal_part ] "yield" subtype_mark )
3. Have to contain at least one yield statement. The grammar of the yield
statement would be the following:
yield_statement ::= "yield" expression ";"
4. Can contain return statements, as long as they have no expressions, in order
to allow the user to terminate the execution early.
Given a yielded type T, the result of calling a generator subprogram would be
of type T'Generator. This type is built-in in order to allow implementation
Freedom. It’s a limited and definite type specific to T, so that given to types
T and U T’Generator is incompatible with U’Generator.
There are two ways of using the resulting generator object:
- The first is to use two built-in attributes of the type:
- 'Next will consume and return the next element in the generator. It would
resume the generator and execute until the next yield point, and get back
the result of the yield.
- 'Terminated will return whether the generator has terminated executing or
not.
- The second is to iterate on it via the same for .. of loop that was
introduced in the 2012 version of Ada. This would be equivalent to using the
underlying attributes to conduct the iteration.
If an exception is raised in the generator while it is executing, subsequently
to a call to the ‘Next attribute, the generator is finalized, and the exception
is propagated into the caller of the ‘Next attribute.
When the generator has terminated:
- A flag is set in the generator object, so that users can check it before
calling next again.
- In the eventuality that the user still calls 'Next, a specific exception
should be raised. This exception would be an instance of
Ada.IO_Exceptions.End_Error, because it seems to fit the meaning of the
context.
The prefered method of iterating is to check the ‘Terminated attribute, but we
need the exception so the behavior is defined in case a user doesn’t check the
‘Terminated attribute correctly.
If the generator is finalized before it has terminated, we have no choice but
to raise an exception into the generator in order to finalize all the
corresponding stack frames and variables.
Here is an example of what a prime number generator might look like:
function Prime_Numbers yield Integer is
I : Integer;
begin
loop
if Is_Prime (I) then
yield I;
end if;
I := I + 1;
end loop;
end Even;
--
for Number of Prime_Numbers loop
Put_Line (Integer'Image (Number));
end loop;
--
declare
_Gen_ : Integer'Generator := Prime_Numbers;
--
begin
while not _Gen_'Terminated loop
declare
Number : constant := _Gen_'Next;
begin
... one or more uses of "Number" ...
end;
end loop;
end;
--
--
declare
_Gen_ : Integer'Generator := Prime_Numbers;
--
begin
loop
declare
Number : constant Integer := _Gen_'Next;
begin
... one or more uses of "Number" ...
end;
end loop;
exception
when Stop_Iteration => null;
end;
While the example doesn’t exploit this, it shall be possible to pass parameters
to a generator. However, out and in out parameters don’t make much sense in the
context of a generator. The following rules are proposed:
IN OUT and OUT parameters are disallowed for generator subprograms.
All parameters that are allowed to be passed by copy should be passed by copy.
Byref types remain passed by reference.
!implementation note
While it might be possible to implement a limited version of this feature using
only code rewritings and the existing runtime support, the recommended approach
is a mix of both:
1. Expose a runtime capability for context switching, eg. saving the state of
the stack, registers, and other local information, and restoring it.
2. Do simple expansion phases to expand from the high-level generator code to a
version using the runtime support.
In that scheme, the following code, that we exposed in the proposal section:
function Prime_Numbers yield Integer is
I : Integer;
begin
loop
if Is_Prime (I) then
yield I;
end if;
I := I + 1;
end loop;
end Even;
could be expanded to this code:
package Int_Generators_Impl is new Generators (Integer);
type Is_Prime_Generator is new Int_Generators_Impl.Object
with null record;
overriding procedure Generate
(D : in out Is_Prime_Generator;
Context : Int_Generators_Impl.Context)
is
I : Integer := 0;
begin
loop
if Is_Prime (I) then
Context.Yield (I);
end if;
I := I + 1;
end loop;
end Generate;
Note that this code is already working code, using a runtime library that was
developed on top of a portable stack switching library called PCL.
Not shown would be the code wrapping the implementation specific generator
object into a Ada.Containers.Generators object and calling the appropriate
operation when an item is requested.
# Semantics of the generator object
- Given a type T, the type T’Generator is a type specific to T and it’s
subtypes. It means that given this code:
type T is private;
subtype T2 is T;
T’Generator and T2’Generator are equivalent.
- The type T’Generator is equivalent to a limited definite type whose
declaration could be:
type T'Generator is limited private;
function Next(Gen : in out T'Generator) return T;
function Terminated(Gen : in T'Generator) return Boolean;
!discussion
# Type of the object returned by a generator:
A delicate point of design is the type of the generator objects. We decided for
a built-in opaque type with attributes rather than a library type, for the
following reasons:
1. Using a library type would imply a generic type, so every generator use
would imply a generic instantiation, which is unnecessary boilerplate and is
detrimental to the feature.
2. Using a built-in opaque type gives more freedom in terms of implementation.
There are some arguments in favor of using a library type too:
1. Less language support is necessary, which is always good.
2. Generators must probably have by-reference semantics, since copying a
generator's state might introduce too many corner cases. Using controlled
objects for that seems like a good fit, and they also solve the problem of
managing the data inside the generator, and freeing it when necessary.
3. This allows deriving from Ada 2012 iterator interfaces so that iteration
on generators, which is a wanted feature, is straightforward to use and
implement.
# Custom syntax
This AI includes some new syntax to the Ada language. It is possible to
implement most of the features necessary for generators as a library, so we
could imagine different levels of new syntax:
- It might be possible to specify that a subprogram is a generator via an aspect
rather than via the yield clause.
- Similarly, it might be possible to replace the yield statement by a procedure
call, like in the expanded code example.
- Finally, we could imagine using a library type rather than a built-in type
for generator types.
However, we think that a custom syntax has the benefit of making clear that
something new is going on in terms of control flow, whereas it might be
confusing in the other case. Also, generic library types will have the drawback
of more unnecessary boilerplate.
On the other hand, it could be argued that the syntax reusing the function
keyword is not explicit enough for Ada, because then the exact kind of the
subprogram is determined by whether it has a return or a yield clause. If
that's a concern, a possible syntax could be:
subprogram_specification ::=
( "procedure" defining_program_unit_name [ formal_part ] )
| ( "function" defining_designator [ formal_part ] "return" subtype_mark )
| ( "generator" defining_designator [ formal_part ] "yield" subtype_mark )
Which would lead to generators of the form:
generator Prime_Numbers yield Integer;
# Comparison with the Loop Body procedure AI
Loop body procedure does provide something akin to yield in a limited fashion,
but does not provide the full semantics of yield: What you can do with a
generator that you cannot do with the loop body procedure is to actually pass
around a generator object and iterate on it at any point you want, something
that you cannot do with the loop body procedure because the scope of the
exposed variables is that of the called procedure.
Take the example of a generator that yields tokens:
type Token is private;
function Tokenize (T : Text) yield Token is
T : Token;
begin
loop
--
yield Token;
end loop;
end Tokenize;
procedure Parse (Tokens : Token'Generator) return Node is
T : Token := Tokens'Next;
begin
if T = "function" then
return Parse_Function (Tokens);
elsif T = "if" then
return Parse_If_St (Tokens);
elsif ...
end if;
end;
procedure Parse_Function (Tokens : Token'Generator) return Fn_Node is
Name : Node;
Ret_Type : Node;
T : Token;
begin
Name := Parse_Name (Tokens);
T := Tokens’Next;
pragma Assert(T = "return");
Ret_Type := Parse_Name (Tokens);
end Parse_Function;
...
Token_Stream : Token'Generator := Tokenize (Text);
Parse (Token_Stream);
Here, the tokens will be computed on demand when the Next primitive on the
Token_Generator object is called. You don't have to express your parser in such
a way that a Parse procedure is called at every token, that has to maintain
parsing state. You can pass the Token_Generator object to any procedure,
including dispatching procedure, or procedures defined in other modules.
You cannot do this with the loop body procedure, or with lambdas in general.
- On the other hand, generators don't provide the ability described in the loop
body procedure AI, to modify elements in place in a collection for example,
except if you pass references/pointers.
- It also doesn't allow iterating over several elements per iteration, like the
Key, Value example. This is a nice plus of the loop body procedure proposal,
however this could also be provided by a more general feature allowing
destructuring of simple data types - which is out of scope for this AI.
# Runtime support vs. rewriting into sequential code
An approach entirely based on rewriting user code into code using callbacks
would be problematic when combining separate compilation, and generators being
passed around by subprograms, or more generally in presence of dispatching
programs returning generators.
package A is
function Foo return Int_Generators.Generator;
end A;
package B is
procedure Bar (G : Int_Generators.Generator);
end B;
Since here, the actual implementation wrapped by the generator object might
vary, rewriting will not be possible in every case.
Forcing the users to have a specific type for every generator instance, rather
than allowing a general type depending on the yield type, would allow
rewriting, but it would still be complicated to do separate compilation (akin
to generic expansion), and abstraction would be lost along the way (no
possibility to to have ad-hoc polymorphism on different generators having the
same yield type)
# Syntax sugar for recursive generators
A classic use case of generators is to make lazy traversal of trees. With the
current version of the proposal, this might be done this way:
type Node is record --
Children : Node_Array;
end record;
function Traverse (Root : Node) yield Node is
begin
for Child of Root.Children loop
for Element of Traverse (Child) loop
yield Element;
end loop;
yield Child;
end loop;
end Traverse;
Here we see that we have to manually iterate on the results of the inner
recursive call to the generator. Some languages have syntax sugar for that,
that would allow to pass the control to the inner generator call, allowing a
simpler body for traverse:
for Child of Root.Children loop
yield from Traverse (Child);
end loop;
# Generators receiving values
In some (not all) implementations of generators, generators are able to be
passed values when their control is restored. It might be interesting to:
1. Think about what such a feature might look like in Ada
2. Try and conceive the initial feature in a way that allows extension to
support this at a later stage.
This won't be covered in this proposal though, because it is thought that the
feature as proposed is ambitious enough for an initial revision.
# By-references parameters
The rule is that some types should always be passed by reference, notably
tagged types, task types, limited types ..
The question is whether we should allow passing those as IN parameters to
generator subprograms, knowing that a generator instance might escape the
initial scope in which it has been defined, and hence reference invalid
entities.
On the one hand, disallowing them entirely seems a bit heavy handed. On the
other hand, we have no easy way to prevent access to now invalid byref
parameters.
It is possible that by applying the same accessibility rules to byref
parameters to generators than to access discriminants, we might get the right
accessibility checks - to be discussed.
!ASIS
** TBD.
!ACATS test
An massive set of ACATS B-Test and C-Tests are needed to check that the
new capabilities are supported.
!appendix
From: Tucker Taft
Sent: Tuesday, June 7, 2016 3:15 PM
This is from one of our engineers, Raphaël Amiard, who will be an observer at
the upcoming ARG meeting in Pisa. [This is version /01 of this AI.]
****************************************************************
From: Randy Brukardt
Sent: Tuesday, June 7, 2016 8:24 PM
Looks way too complex to me.
I don't see any discussion of how this interacts with finalization/task
termination. The accessibility notes are scary by themselves, and finalization
seems worse.
Can a generator be passed to a different task (via entry parameters)? If so,
what happens and why? (I.e. consider a local controlled object, like a lock.)
If not, how come? How would such a prohibition be enforced without breaking
the contract model?
This looks like an ARG full employment act. ;-)
****************************************************************
Questions? Ask the ACAA Technical Agent