Version 1.1 of ai05s/ai05-0186-1.txt

Unformatted version of ai05s/ai05-0186-1.txt version 1.1
Other versions for file ai05s/ai05-0186-1.txt

!standard 13.15 (0)          09-11-02 AI05-0186-1/01
!class amendment 09-11-02
!status work item 09-11-02
!status received 09-11-02
!priority Medium
!difficulty Medium
!subject Global-in and global-out annotations
!summary
Annotations are provided to describe the use of global objects by subprograms.
!problem
We're considering adding many kinds of annotations to Ada (see AI05-0145-1 and AI05-0146-1) in order to increase the ability of a programmer to specify contracts for types and subprograms. These contracts are defined to be checked at runtime, but are intended to be potentially checked (at least in part) statically.
However, without knowledge of side-effects of functions, a tool has to assume the worst about any user-defined functions. That is, that the result of a function can change even when called with the same operands. Other assumptions could cause incorrect programs to be accepted as correct, and if the assumptions were used to omit "proved" annotations, to erroneous execution. Both of these results are unacceptable for a feature intended to improve the correctness of programs.
The worst-case assumptions pretty much prevents any analysis unless the bodies of any user-defined functions used in the annotations are available. This is bad, as it prevents analysis of programs as they are constructed. If the body is not available, no analysis is possible. Moreover, analysis depending on a body require creating pseudo body dependencies (if the body is changed, any analysis depending on that properties of that body would have to be performed again); but the language does not allow these to be "real" body dependencies (any recompilation needed has to occur automatically).
Ideally, analysis at the initial compile-time of a unit would be possible, as it is important to detect errors as soon as possible. More information about function side-effects is needed in the specification of subprograms in order to accomplish that goal.
!proposal
Add global_in and global_out annotations to Ada.
!wording
Semi-formal description:
Add to the aspect syntax given in AI05-0184-1:
Aspect_Mark => BODY Annotation_List
[Note: I'm using "Body" here to differentiate this list from the normal expression used for other aspects. That's necessary so that special resolution
Annotation_List ::= Annotation_Item {, Annotation_Item}
Annotation_Item ::= *object_*name | *access_subtype_*name | null | private [in *package*name]
The names in these items resolve with the same visibility as other aspect expressions (that usually is the end of the visible part). The names must resolve without context.
Aspects Global_In and Global_Out are defined.
Static Semantics
A read or write of an object O is covered by an annotation in an Annotation_List if:
* O statically denotes a part of an object named in the Annotation_List;
* The annotation PRIVATE without a package name occurs in the list;
* The annotation PRIVATE with a package name P occurs in the list, and O denotes an object declared in P, or O denotes a part of an object produced by a dereference of an object of an access type declared in P; or
* O denotes a part of an object produced by a dereference of an object of an access type named in the Annotation_List.
AARM Ramification: "private" by itself allows anything to be read or written.
An annotation A1 is compatible with another annotation A2 if:
* A1 contains NULL; or
* A2 contains the annotation PRIVATE without a package name; or
* All of the following must be true: -- Each object_name in A1 is covered by an annotation in A2; -- Each access_subtype_name in A1 is found in A2; and -- For each PRIVATE annotation with a package name P, there is -- a PRIVATE annotation in A2 that statically denotes P.
AARM Ramification: This is not symeterical.
Legality Rules
The Annotation_Item NULL has to be the only item in an Annotation_List if it is given.
An object_name given in an Annotation_List shall statically denote some object.
If a subprogram contains a Global_In annotation, then:
* every object that is read by the subprogram body shall either be accessible from the function call, or be covered by the Global_In annotation.
* every call in the subprogram body shall have a compatible Global_In annotation.
[The first part means that the object has the accessibility of the function call.]
AARM Ramification: Local objects and the parameters of the call do not need to appear in a Global_In annotation.
If a subprogram contains a Global_Out annotation, then:
* every object that is written by the subprogram body shall either be accessible from the function call, or be covered by the Global_Out annotation.
* every call in the subprogram body shall have a compatible Global_Out annotation.
---
Add to 4.5:
Unless otherwise specified, the predefined operators of the language all have implied annotations of Global_In => null, Global_Out => null.
[Do we want to give annotations to other operations, such as some of those in the containers?]
[We could define that language-defined Pure subprograms that do not have access parameters have implied annotations of Global_In => null, Global_Out => null.]
!discussion
The global_in and global_out annotations are intended to be descriptive. That is, the annotations describe the behavior of the subprogram body. As such, these annotations are checked. There is no benefit to be gained from lying about side-effects; such incorrect information could easily lead to incorrect code being generated (if the compiler uses the annotations). Moreover, if it is impractical to specify the correct annotations, it makes sense to specify none at all.
All of the global_in and global_out annotations proposed can be statically checked. It is possible to imagine annotations that cannot be statically checked, but these are assumptions as opposed to descriptive. Assumptions are a different set of annotations.
Note that it is possible to lie about the use of globals of subprograms that are imported. This is of course impossible to check, but thankfully is covered by B.1(38.1/2). Incorrect annotations on imported subprograms could cause a program to be erroneous.
---
These annotations can be used by the compiler to guide code generation as well as for static analysis. In that case, rules similar to those for Pure subprograms should be applied -- it is OK to assume that the annotations are correct, but if the subprogram is called, then the compiler must use the actual side-effects, not any assumed because of annotations.
---
It might make the most sense to define these in Annex H (High-Integrity Systems), rather than in Section 13. The most important thing is to define the syntax so that implementations can implement it and the checking.
!example
** TBD **
!ACATS test
!appendix

From: Randy Brukardt
Sent (privately): Wednesday, March 18, 2009  4:30 PM

As I understand it (mainly from an old message of John's about "exposing your
globals"), Global_In is a list of names of globals touched by the body of a
subprogram. It also ought to include two special names: "null" (for explicitly
stating that no globals are harmed by this subprogram :-); and "private" (for
globals that aren't visible at the point of the annotation). By default, all
subprograms have

    Global_In => private,

which says that they touch globals but we aren't going to say which ones.

Presumably, Global_Out are a similar list of globals written by a subprogram.

Do I have it so far?

Now, the problem with the SPARK definition is that it doesn't take into account
things like storage pools and tasks, which we'll have to do.

For the storage pool problem, I think that declaring any collections read would
be sufficient, and that could easily be done by mentioning the name of the
access type in the annotation. (Directly naming the pool has two problems: the
pool may not be visible while the access type is, and secondly, the default pool
does not have a name.)

I'm going to ignore tasks for now (not really relevant to my questions).

I do think that the compiler needs to check that this listing is correct. (And I
think that is fairly easy: any object whose accessibility is outside of that of
the subprogram call "shall be listed" in the Global_In annotation unless the
Global_In annotation includes "private". ["Subprogram call" allows
accessing/modifying the parameters.] That would of course include anything
listed in the Global_In annotation of any called routines. Language-defined
operators other than "=" for tagged types would be defined to have Global_In =>
null.) Having this information incorrect would lead to all kinds of bad
assumptions by the compiler and by the reader. And you can always use Global_In
=> private if you don't want to say - just don't lie.

I note that this gets the effect of the "strict" annotation that I was lobbying
for, but in a way which is much harder to argue with. ("Global_In => null" means
"I don't have a side effect other than as declared in the spec", and the
compiler and reader can depend on that.) We don't get the elaboration benefit (I
would have allowed early calls on strict functions, because so long as their
parameters are frozen, they are safe), but that is no big deal.


But I am wondering about accesses to sub-parts of parameters. My "strict"
definition would have banned that, because it would allow depending on
side-effects. The way I wrote the description above would also ban that, but I
worry that people would be annoyed. This problem comes from my recognition that
the Claw Is_Valid routine is not strict, even though it only depends on reading
its parameter. For example:

     type A_Window is private;

     function Is_Valid (A : A_Window) return Boolean
     when Global_In => ???;

  private

     type Data is record
        Valid : Boolean := False;
        Handle : HWnd := Null_HWnd;
        ...
     end record;
     type Data_Access is access all Data;

     type A_Window is record
         My_Data : Data_Access;
         ...
     end record;

The problem here is that the user of the GUI application can manually close a
window, at which point Valid gets set to False and the Handle gets set to
Null_HWnd (among other things, of course).

Is_Valid only reads its parameter, but the portion that is pointed at can be
modified asynchronously by another task. So it would be wrong for an analyzer to
depend on the value of Is_Valid not changing. (This is a huge challenge when
programming with Claw.)

So, does this mean that type Data_Access must be included in the Global_In list?
(In this particular case, since the type is defined in the private part, the
Global_In list would have to be "private").

****************************************************************

From: Tucker Taft
Sent (privately): Wednesday, March 18, 2009  5:31 PM

...
> Do I have it so far?

I hadn't thought about it too much in detail yet.  As you know SPARK has
something like this.

The notion of "private" doesn't sound too useful.  I would suggest allowing the
names of packages as well as objects, where the name of a package means some
presumably non-visible "state" in the package (spec or body).  I suspect we
would need to parenthesize the list if there is more than one item, to avoid
creating a syntactic mess, with "null" substituting for the syntactically
illegal "()".

>
> Now, the problem with the SPARK definition is that it doesn't take
> into account things like storage pools and tasks, which we'll have to do.
>
> For the storage pool problem, I think that declaring any collections
> read would be sufficient, and that could easily be done by mentioning
> the name of the access type in the annotation. (Directly naming the
> pool has two
> problems: the pool may not be visible while the access type is, and
> secondly, the default pool does not have a name.)

Yes, an access type would seem reasonable as a stand-in for the collection it
represents.  And a package name would cover any non-visible *pool-specific*
access types declared within it.

If you are going through a non-visible *general* access type, then I think we
would want some notation for that if the access type is private, but the
designated type is not.  E.g. "all aliased T" might be a legal entry in the
list, and perhaps simply "all aliased" if the type is itself non-visible.

Alternatively, it might be appropriate in some cases to identify the particular
pointed-to object referenced/updated, e.g. Global_In => (X.all, Y.T.all),
Global_Out => (X.A, Y.B.all); Saying an entire access collection is affected is
too imprecise for some kinds of analysis.  And if the access type is a general
access type, then you are effectively "killing" all aliased objects of the
designated type (or types covered by it).

> I'm going to ignore tasks for now (not really relevant to my questions).

I don't see tasks as a big special case.  A task has state just like other
objects.  The slightly weird thing is that you can change the state of a task
even though you have it as an "IN" parameter.  Probably one way to think about
it is that every task has its own "collection", and the stack of the task is
allocated out of that collection, and the task object is really just a pointer
to that.  A call on an entry of a task passes an implicit dereference of that
pointer to the accept statement.

> I do think that the compiler needs to check that this listing is correct.
> (And I think that is fairly easy: any object whose accessibility is
> outside of that of the subprogram call "shall be listed" in the
> Global_In annotation unless the Global_In annotation includes "private". ["Subprogram call"
> allows accessing/modifying the parameters.] That would of course
> include anything listed in the Global_In annotation of any called routines.
> Language-defined operators other than "=" for tagged types would be
> defined to have Global_In => null.) Having this information incorrect
> would lead to all kinds of bad assumptions by the compiler and by the
> reader. And you can always use Global_In => private if you don't want to say - just don't lie.

Yes, I would agree with all of this.  But I would also try to define the
semantics so that if you manage to lie (e.g. by referencing imported
subprograms), the world doesn't implode. Instead it is like pragma Pure, the
compiler is allowed to avoid calling a subprogram again if it concludes
everything it takes as input is unchanged, and use the prior results of the
call.  It isn't allowed to assume that if it actually *makes* the call, it will
always get the same answer.

Something similar would be said about Global_Out, which would permit reusing
cached values after a call rather than reloading them from memory if they
supposedly wouldn't be affected by the call, but if you actually go to the
trouble of *reloading* the value from memory, you shouldn't presume this newly
loaded value is identical to the old one, and use the newly reloaded value to,
say, index into an array without rechecking against the bounds.

> I note that this gets the effect of the "strict" annotation that I was
> lobbying for, but in a way which is much harder to argue with.
> ("Global_In => null" means "I don't have a side effect other than as
> declared in the spec", and the compiler and reader can depend on
> that.) We don't get the elaboration benefit (I would have allowed
> early calls on strict functions, because so long as their parameters
> are frozen, they are safe), but that is no big deal.
>
>
> But I am wondering about accesses to sub-parts of parameters. My "strict"
> definition would have banned that, because it would allow depending on
> side-effects.

I don't really understand this last sentence.  I'm not sure what is a "sub-part"
and I'm not sure what you mean about "depending on side-effects."  I think I
understand what you mean in your example below, but I don't understand this
introductory sentence.

> ... The way I wrote the description above would also ban that, but
> I worry that people would be annoyed. This problem comes from my
> recognition that the Claw Is_Valid routine is not strict, even though
> it only depends on reading its parameter. For example:
>
>      type A_Window is private;
>
>      function Is_Valid (A : A_Window) return Boolean
>      when Global_In => ???;
>
>   private
>
>      type Data is record
>         Valid : Boolean := False;
>         Handle : HWnd := Null_HWnd;
>         ...
>      end record;
>      type Data_Access is access all Data;
>
>      type A_Window is record
>          My_Data : Data_Access;
>          ...
>      end record;
>
> The problem here is that the user of the GUI application can manually
> close a window, at which point Valid gets set to False and the Handle
> gets set to Null_HWnd (among other things, of course).
>
> Is_Valid only reads its parameter, but the portion that is pointed at
> can be modified asynchronously by another task. So it would be wrong
> for an analyzer to depend on the value of Is_Valid not changing. (This
> is a huge challenge when programming with Claw.)
>
> So, does this mean that type Data_Access must be included in the
> Global_In list? (In this particular case, since the type is defined in
> the private part, the Global_In list would have to be "private").
>
> Thanks in advance for your answer.

If you read something you get through an access value, then you will need to
account for it somehow, as "blah.all", the name of an access type, the name of a
package, or something like "all aliased [T]".

I think you are also talking about issues that have to multi-threading here.  If
the asynchronous event can happen at "any" time, then I would say you should
really declare the "Valid" field as volatile, in which case we would probably
want some way of stating that the result of a function is volatile and hence
essentially unpredictable.

If the aynchronous update can only happen during to some kind of
potentially-blocking or signaling operation, then it should be handled the
normal way things are handled at potential task signaling/blocking points, where
some appropriate part of the universe might be affected by the actions of other
tasks.

****************************************************************

From: Randy Brukardt
Sent (privately): Wednesday, March 18, 2009  5:59 PM

...
> The notion of "private" doesn't sound too useful.  I would suggest
> allowing the names of packages as well as objects, where the name of a
> package means some presumably non-visible "state" in the package (spec
> or body).  I suspect we would need to parenthesize the list if there
> is more than one item, to avoid creating a syntactic mess, with "null"
> substituting for the syntactically illegal "()".

Well, it surely isn't useful for analysis, but it seems necessary to have a
default that says, yup, we have some side effects and we aren't going to say
which ones. Because we surely can't require this annotation on all subprograms,
but we do have to know what to do if such a subprogram is called from inside of
one that does have an annotation.

If there is a better way to express the default of "anything goes, assume the
worst", I'm happy to see it.

...
> > I do think that the compiler needs to check that this listing is correct.
> > (And I think that is fairly easy: any object whose accessibility is
> > outside of that of the subprogram call "shall be listed" in the
> > Global_In annotation unless the Global_In annotation includes "private".
> > ["Subprogram call" allows accessing/modifying the parameters.] That
> > would of course  include anything listed in the Global_In annotation
> > of any called routines.
> > Language-defined operators other than "=" for tagged types would be
> > defined to have Global_In => null.) Having this information
> > incorrect would lead to all kinds of bad assumptions by the compiler
> > and by the reader. And you can always use Global_In => private if
> > you don't want to say - just don't lie.
>
> Yes, I would agree with all of this.  But I would also try to define
> the semantics so that if you manage to lie (e.g. by referencing
> imported subprograms), the world doesn't implode.
> Instead it is like pragma Pure, the compiler is allowed to avoid
> calling a subprogram again if it concludes everything it takes as
> input is unchanged, and use the prior results of the call.  It isn't
> allowed to assume that if it actually
> *makes* the call, it will always get the same answer.

Well, for static analysis, either you have the truth or the world implodes. :-)
And that's also true for legality rules - if you depend on an incorrect
annotation when enforcing a legality rule (or more accurately not enforcing),
it's pretty much garbage in, garbage out. Probably would want to define that as
a bounded error if at all possible (either you get an error or the code works at
runtime).

But I agree that we pretty much have to leave the Duff door open: lying in
annotations on an imported subprogram can't be banned or detected.

I do agree that uses in optimizations and code selection should follow rules
like you give here.

> Something similar would be said about Global_Out, which would permit
> reusing cached values after a call rather than reloading them from
> memory if they supposedly wouldn't be affected by the call, but if you
> actually go to the trouble of *reloading* the value from memory, you
> shouldn't presume this newly loaded value is identical to the old one,
> and use the newly reloaded value to, say, index into an array without
> rechecking against the bounds.

Right. I'm not personally as interested in Global_Out, at least not right this
minute, but I can see how it would be useful.

...
> > But I am wondering about accesses to sub-parts of parameters. My "strict"
> > definition would have banned that, because it would allow depending
> > on side-effects.
>
> I don't really understand this last sentence.  I'm not sure what is a
> "sub-part" and I'm not sure what you mean about "depending on
> side-effects."  I think I understand what you mean in your example
> below, but I don't understand this introductory sentence.

I mean cases like the example below. In other words, ignore the above if it is
too confusing.

> > ... The way I wrote the description above would also ban that, but I
> > worry that people would be annoyed. This problem comes from my
> > recognition that the Claw Is_Valid routine is not strict, even
> > though it only depends on reading its parameter. For example:
> >
> >      type A_Window is private;
> >
> >      function Is_Valid (A : A_Window) return Boolean
> >      when Global_In => ???;
> >
> >   private
> >
> >      type Data is record
> >         Valid : Boolean := False;
> >         Handle : HWnd := Null_HWnd;
> >         ...
> >      end record;
> >      type Data_Access is access all Data;
> >
> >      type A_Window is record
> >          My_Data : Data_Access;
> >          ...
> >      end record;
> >
> > The problem here is that the user of the GUI application can
> > manually close a window, at which point Valid gets set to False and
> > the Handle gets set to Null_HWnd (among other things, of course).
> >
> > Is_Valid only reads its parameter, but the portion that is pointed
> > at can be modified asynchronously by another task. So it would be
> > wrong for an analyzer to depend on the value of Is_Valid not
> > changing. (This is a huge challenge when programming with Claw.)
> >
> > So, does this mean that type Data_Access must be included in the
> > Global_In list? (In this particular case, since the type is defined
> > in the private part, the Global_In list would have to be "private").
> >
> > Thanks in advance for your answer.
>
> If you read something you get through an access value, then you will
> need to account for it somehow, as "blah.all", the name of an access
> type, the name of a package, or something like "all aliased [T]".

Right. I suspect some people will not like that much (you have even argued wrt
to constants that such accessed parts are logically part of the passed object).
But in this case, they are *not* logically part of the passed object.

> I think you are also talking about issues that have to multi-threading
> here.  If the asynchronous event can happen at "any" time, then I
> would say you should really declare the "Valid" field as volatile, in
> which case we would probably want some way of stating that the result
> of a function is volatile and hence essentially unpredictable.
>
> If the aynchronous update can only happen during to some kind of
> potentially-blocking or signaling operation, then it should be handled
> the normal way things are handled at potential task signaling/blocking
> points, where some appropriate part of the universe might be affected
> by the actions of other tasks.

Claw uses locks so that the state can't change inside of a Claw-defined routine.
If a state change needs to happen, it is delayed until the lock is available,
meaning after the current routine is finished. But we don't lock things that
only read scalar components (and don't have dependencies), because those can't
be in an inconsistent state: the Valid component can only be True or False.

I suppose that they really ought to be declared volatile, but given the nasty
impact on code generation, and that we only care that Valid is not some value
other than True or False (which can't happen with any of the Ada compilers we've
used), we never did that.

In any case, the pool or object has to be identified somehow in Global_In. That
eliminates my concern.

It does seem that a real proposal would be valuable here. If I had it, I could
point to it rather than duplicating a lot of that in AI-144's discussion.

****************************************************************

Questions? Ask the ACAA Technical Agent