# Version 1.7 of ai05s/ai05-0158-1.txt

Unformatted version of ai05s/ai05-0158-1.txt version 1.7
Other versions for file ai05s/ai05-0158-1.txt

!standard 3.8.1(5)          10-01-20 AI05-0158-1/04
!standard 4.4(3)
!standard 4.5.2(3)
!standard 4.5.2(5)
!standard 4.5.2(27)
!class Amendment 09-06-07
!status work item 09-06-07
!priority Low
!difficulty Medium
!subject Generalizing membership tests
!summary
Extend the syntax of membership tests to simplify complex conditions that can be expressed as membership in a subset of values of any type. Introduce syntax for a list of expressions that may be used in loop contexts as well.
!problem
Conditions of the form (X = A) or else (X = B) or else (X = C) where A, B, C are of some arbitrary type are common, and will be more frequent when pre- and post-conditions become widely used. If the values of A, B,and C are contiguous values of some discrete type, the expression can be written as a membership operation: (X in A..C) . Otherwise this syntactic shortcut is not available. We propose a simple extension of the syntax of membership operations, so that the right operand can specify a subset of values of some type. The type does not need to be discrete, and the choices do not need to be static, so that one can write:
if C not in 'A' | 'B' | 'O' then Put_Line ("invalid blood type"); else ..
while Name (1 .. N) in T.Element (X) | T.Element (Y) | T.Element (Z) loop ...
!proposal
We propose a simple extension of the syntax of membership operations, so that the right operand can specify a subset of values of some type. We propose a simple extension of the syntax of membership operations, so that the right operand can specify a subset of values of some type. The type does not need to be discrete, and the choices do not need to be static, so that one can write:
if C not in 'A' | 'B' | 'O' then Put_Line ("invalid blood type"); else ..
while Name (1 .. N) in T.Element (X) | T.Element (Y) | T.Element (Z) loop ...
!wording
Modify 3.8.1 (5) as follows:
discrete_Choice ::= choice_expression | discrete_Range | others
Modify 4.4(3) as follows:
expression := relation {logical_operator relation}
choice_expression := choice_relation {logical_operator choice_relation}
choice_relation ::= simple_expression [relational_operator simple_expression]
relation ::= simple_expression [not] in membership_choice_list
membership_choice_list ::= membership_choice {'|' membership_choice}
membership_choice ::= choice_expression | range | subtype_mark
logical_operator ::= and | and then | or | or else | xor
Add after 4.5.2(3): (Name Resolution Rules)
If the membership_choice_list in a membership operation has more that one choice, then all choices must have a single common type, which is the tested type of the operation.
Add after 4.5.2 (5) (Static semantics)
The expected type of a choice_expression in a membership_choice is any type.
Modify 4.5.2(27) as follows:
If the choice_list has a single choice, the simple expression and the choice are elaborated in arbitrary order.
If the choice_list has more that one choice, the left operand is elaborated first and the operation is equivalent to a short-circuit sequence of individual tests on each choice.
Modify 4.5.2 (28) as follows:
A membership test using in yields the result True if any of the individual tests on the elements of the choice list yields True.
An individual membership test yields the result True if:
* The choice is an expression, and the simple_expression is equal to the value
of the choice. If the type of the expression is a record type or a limited type, the test uses the primitive equality for the type; otherwise the test uses predefined equality.
* The choice is a range or subtype mark and:
(followed by para. 29-30)
!discussion
The new syntax creates an upward incompatibility: a choice cannot be an expression, as in Ada2005, but is limited to an expression without a membership operator. This is likely to be harmless, and seems to be preferable to making the grammar ambiguous, as shown in the following example, in which a choice list can contain an arbitrary expression:
X : Boolean := .. case X is when Y in 1..10 | 20 => -- ambiguous: -- could be (Y in 1..10) | 20 -- which is otherwise type-incorrect
Even though this is a contrived example, unlikely to show up in realistic code, it seems preferable to modify the grammar as shown above.
There was a suggestion that the type of an expression in a choice should be non-limited. However, given that currently a subtype_mark in a choice can be of any type, it would be awkward to introduce this restriction. Clearly if the type has no primitive equality a membership test over a list of expressions is illegal.
[Editor's note: This decision brings visibility into the legality of membership operations. That seems like a lousy choice (pun intended). Personally, I would prefer the inconsistency. (Well, I'd prefer AI-153-2 instead of this one, which would require these to be discrete only, but make them work everywhere. But that's a separate issue.)]
!example
(See proposal.)
!ACATS test
Add an ACATS test for this new feature.
!appendix
```
From: Robert Dewar
Date: Monday, March 30, 2009  8:02 PM

How about allowing Pascal style sets while we are at it?

pragma Precondition
(xyz(3) = 'w' or else xyz(3) = 'x' or else xyz(3) = 'z');

is pretty horrible, now if only we had y instead of z we could write

(xyz(3) in 'w' .. 'y')

but suddenly for one missing element in the sequence we are driven to the top
form

(xyz(3) in ('w','x','z'))

easy to implement, easy to read, and very convenient

I would also allow e.g.

(xyz(3) in ('d'..'f', 'm', 'r'..'s'))

In the case of the GNAT code, we used to have

Nkind (X) = N_Node1 or else Nkind (X) = N_Node2

and we defined

Nkind_In (X, N_Node1, Node_2)

we have nine variants for 2 to 9 arguments, but really it would be much nicer to
say

Nkind (X) in (N_Node1, N_Node2)

I thought of {} instead of (), but I think () is probably more consistent with

I would allow these forms ONLY in membership tests, and the meaning of

Expr in (A, B, C)

Expr in (A) or else Expr in (B) or else Expr in (C)

if A is an expression

Expr in (A) means Expr = A

if A is a range or subtype name

Epxr in (A) means Expr in A

notin by obvious analogy.

It is a little tempting to allow them in loops as well

for J in (A, B, C) loop

but I would settle for just the membership tests

****************************************************************

From: Tucker Taft
Date: Monday, March 30, 2009  9:09 PM

I would prefer to use the "|" symbol, since that is what is already used in
choice lists.  E.g.:

Xyz(3) in ('w' .. 'x' | 'z')

Not sure whether the parens should be required.

****************************************************************

From: Robert Dewar
Date: Monday, March 30, 2009  9:21 PM

I am cool with | instead of ,

And yes, if you go with | instead of , you could indeed leave out the parens. I
sort of like the parens (I think of them as {} :-))

But I can go either way. Can I presume that the fact that you are
micro-analyzing the syntax means you agree with the basic idea? :-)

****************************************************************

From: Tucker Taft
Date: Monday, March 30, 2009  9:43 PM

I have always been annoyed that I can use multiple choice ranges in a case
alternative, but not in other similar contexts, such as membership tests (and
possibly "for" loops).  I frequently find myself turning what should be a simple
membership test into a case statement merely to get the multiple-choice-range
feature.

So yes, I support using "|" in more contexts. And I agree that preconditions and
postconditions will be heavy users of membership tests, and it will be annoying
if you can only conveniently talk about a single contiguous range of values.

****************************************************************

From: Randy Brukardt
Date: Monday, March 30, 2009  10:01 PM

>
>      (xyz(3) in ('w','x','z'))
>
> easy to implement, easy to read, and very convenient

Humm, the devil is in the details here. For what types are these memberships
supposed to work? Can they be static? Etc.

If they're only for static discrete types (as in case statements, based on
Tucker's responses), then I surely agree that they're easy to implement. I'm not
so sure in other cases.

****************************************************************

From: Robert Dewar
Date: Monday, March 30, 2009  10:11 PM

> If they're only for static discrete types (as in case statements,
> based on Tucker's responses), then I surely agree that they're easy to
> implement. I'm not so sure in other cases.

I gave a semantic equivalence, and no, it is not acceptable to limit these to
static values, but of course they are limited to discrete types, that goes
without saying.

****************************************************************

From: Tucker Taft
Date: Monday, March 30, 2009  10:04 PM

If we don't want to require parentheses in a membership test, then we would have
to change discrete choice to use a simple_expression rather than an expression:

discrete_choice ::=
simple_expression | discrete_range | others

and then we could change membership test to be:

simple_expression [not] in choice_list

where (non-discrete) choice and choice_list are:

choice ::= simple_expression | range | subtype_mark

choice_list ::= choice { '|' choice }

****************************************************************

From: Robert Dewar
Date: Monday, March 30, 2009  10:17 PM

If there are no parens, don't we have confusion in

case x is
when a in b | c

could be

when (a in b) | c

or

when a in (b | c)

****************************************************************

From: Tucker Taft
Date: Monday, March 30, 2009  10:34 PM

Yes, unless we change discrete_choice to use "simple_expression" rather than
"expression" which doesn't sound like such a bad idea in any case.

****************************************************************

From: Robert Dewar
Date: Tuesday, March 31, 2009  8:43 AM

Isn't that a clear upwards incompatibility?

****************************************************************

From: Tucker Taft
Date: Tuesday, March 31, 2009  11:29 AM

Yes, it is an upward incompatibility, though probably a reasonable one.
discrete_choice is used for case alternatives, variant alternatives, and array
aggregates. What you would lose is the ability to use relationals and membership
tests in such contexts, presuming the case expression, discriminant, or array
index is of a boolean type.

Of course these contexts allow no overlap in choices, and the case and variant
alternatives require the choice to be static, so trying to figure how to use a
membership test or a relational in a discrete_choice would be interesting.

E.g.:

case X is
when A < B =>
...
when others =>
...
end case;

or

Y := (A in 1..10 => 42, others => 43);

It is a bit hard to imagine how non-simple_expressions would appear in a
legitimate and legal program. Adding parentheses would probably help such
programs, should they exist.

Note that the suggested change would be upward "consistent," in that the
compiler would always catch the problem.

****************************************************************

From: Bob Duff
Date: Tuesday, March 31, 2009  11:59 AM

> If we don't want to require parentheses in a membership test, then we
> would have to change discrete choice to use a simple_expression rather
> than an expression:
>
>    discrete_choice ::=
>      simple_expression | discrete_range | others

Interestingly, this restriction was present in the Ada 83 syntax, and was
deliberately removed.

See AARM-4.4:

15.b  In various contexts throughout the language where Ada 83 syntax
has expression instead. This reflects the inclusion of modular
integer types, which makes the logical operators "and", "or", and
"xor" more useful in expressions of an integer type. Requiring
parentheses to use these operators in such contexts seemed
unnecessary and potentially confusing. Note that the bounds of a
range still have to be specified by simple_expressions, since
otherwise expressions involving membership tests might be ambiguous.
Essentially, the operation ".." is of higher precedence than the
logical operators, and hence uses of logical operators still have to
be parenthesized when used in a bound of a range.

And AARM-3.8.1:

29.b  The syntactic category choice is removed. The syntax rules for
variant, array_aggregate, and case_statement now use
discrete_choice_list or discrete_choice instead. The syntax rule for
record_aggregate now defines its own syntax for named associations.

choice ::= simple_expression
| discrete_range | others | component_simple_name

****************************************************************

From: Tucker Taft
Date: Tuesday, March 31, 2009  12:40 PM

Yes, I remember us making that change, simply for uniformity, as there was no
particular reason for the restriction.  Now that we have a reason, well that
changes everything!

****************************************************************

From: Robert Dewar
Date: Tuesday, March 31, 2009  3:25 PM

All these interesting comments on the syntax, but what do you think of the basic
idea? :-)

****************************************************************

From: Bob Duff
Date: Tuesday, March 31, 2009  3:39 PM

I haven't decided yet.  More later...

****************************************************************

From: Ed Schonberg
Date: Friday, April 24, 2009  8:28 AM

Going back to the syntax of set comprehensions:  the last proposal was to leave
out parentheses and revert to the Ada83 rule that only simple expressions can
occur is discrete choices.  The fact that this was an incompatibility was
dismissed as "probably harmless".  We just came up with an instance of a
relation in a case statement :  an OR of assorted constants in a modular
context.  An old lesson:  any upwards  incompatibility is suspicious.   So I
would prefer to see set  comprehensions as parenthesized,  |-separated lists,
with their own  production. For now they can only appear in membership tests,
but   having an explicit name for them will make it easier at a later date to
introduce them in loops, as was suggested. From there to treating them as
expressions and having proper set algebra is but an easy step!

****************************************************************

From: Robert Dewar
Date: Friday, April 24, 2009  9:41 AM

1. I much prefer the alternatives syntax, since we already have it in
the language, and it reads very nicely.

I do not see these as set comprehensions, but rather just an
abbreviation in conditionals, similar to COBOL
'
IF A EQUALS 2 OR 3 OR 17 THEN

if A in 2 | 3 | 17 then

seems nice.

2. I am dubious about allowing sets in loops, but even if we
go that far, the | syntax is still natural (and indeed since
the keyword is IN, is a smooth extension).

3. I do not like Ed's "easy step", it sounds like a mess to me from
all points of view (informal definition, formal definition including
type resolution, and implementation), and is simply too large a step,
too much outside what Ada is as a language for my taste.

4. Regarding the issue of upwards incompatibility, we have three choices

a) Just go back to the Ada 83 syntax. As Ed points out, this did
cause an incompatibility in our run time library, where we had

when A or B or C =>

and we have not tried our big test suite with this change (I will
try that run at some point).

If we decide on doing this, I would suggest that we immediately
issue a non-binding interpretation saying that we made a mistake
in Ada 95 and Ada 20xx and that we allow only simple expressions
in choices, allowing compilers to immediately implement the
change now, and avoiding the accumulation of more incompatibility
in existing code.

b) Just agree to resolve the potentially ambiguous forms in the Ada
95 manner. Messy from a formal point of view, but fine in practice
because only strange test programs will contain things like:

case Some_Boolean is
when A in B | C | D => bla bla

c) Muck with the syntax to simply exclude the use of IN or NOT IN
in this context, i.e. we introduce a new class of expression,
e.g, RRESTRICTED_EXPRESSION

which allows unparenthesized use of logical operators but does
not allow unparenthesized use of IN or NOT IN.

Fairly easy to implement, a bit awkward to define, but not

rexpression ::=
rrelation {and rrelation}  | rrelation {and then rrelation}
| rrelation {or rrelation}  | rrelation {or else rrelation}
| rrelation {xor rrelation}

rrelation ::=
simple_expression [relational_operator simple_expression]

Here rexpression = restricted_expression and
rrelation = restricted_relation (just did not feel like that
much typing).

From a users point of view, b) or c) are far preferable to a).

****************************************************************

From: Ed Schonberg
Date: Friday, April 24, 2009  10:41 AM

or d)  require a parenthesized list. Upwards compatible. Clean and orthogonal.

****************************************************************

From: Robert Dewar
Date: Friday, April 24, 2009  10:52 AM

But my list is predicated on keeping the | syntax. Yes, the parenthesis notation
avoids compatibility issues, but I think that is its only advantage. Remember I
originally suggested the parentheses, but as soon as Tuck suggested use of the
existing choices syntax, that made so much more sense to me, because the usage

if A in 2 | 3 .. 6 | 9 then

is so nicely parallel to

case 2 | 3 .. 6 | 9 =>

in both cases, we have an implied OR between equality conditions.

Anyway, let's see if this discussion can inspire some other ARG members to give
their opinions.

****************************************************************

From: Robert Dewar
Date: Friday, April 24, 2009  12:08 PM

I ran a test of our test suite requiring Simple_Expression for choices, number
of problems detected = ZERO, so it is only the one case in our own run-time that
ran into this problem.

****************************************************************

From: Randy Brukardt
Date: Friday, April 24, 2009  9:19 PM

...
> But my list is predicated on keeping the | syntax. Yes, the
> parenthesis notation avoids compatibility issues, but I think that is
> its only advantage. Remember I originally suggested the parentheses,
> but as soon as Tuck suggested use of the existing choices syntax, that
> made so much more sense to me, because the usage
>
>      if A in 2 | 3 .. 6 | 9 then
>
> is so nicely parallel to
>
>      case 2 | 3 .. 6 | 9 =>
>
> in both cases, we have an implied OR between equality conditions.

I agree. If we bother doing this at all, it should look as much as possible like
other things already in the language.

I don't like introducing an incompatibility for it, however -- it doesn't seem
important enough for that. I think Robert's "restricted_expression" idea is
probably best; using a membership (*any* membership) in a case statement is
pretty weird. If we're going to take an incompatibility, it ought to be there
and not in modular operations (which seem much more likely to come up in
practice, and indeed you have an example). But sticking exactly with the Ada 95
rules would be OK, too.

> Anyway, let's see if this discussion can inspire some other ARG
> members to give their opinions.

I've done it...

****************************************************************

From: Robert Dewar
Date: Saturday, April 25, 2009  8:15 AM

I implemented the restricted expression approach in our experimental
implementation of set membership. It took only 10 lines of code, and with this
change we get zero regressions on our own sources and our regression suite (when
we force this restriction on unconditionally).

****************************************************************

From: Ed Schonberg
Date: Sunday, November 1, 2009  10:25 AM

Here is an updated version of AI05-0158 (membership tests)
incorporating the changes suggested by you and Robert.
The new syntax is unambiguous, and presents a minuscule incompatibility with Ada2005.

[Editor's note: This is version /02 of the AI.]

****************************************************************

From: Tucker Taft
Date: Sunday, November 1, 2009  11:13 AM

It would help to have some examples.

Also, rather than using "restricted_*" where "restricted"

choice_expression ::=
choice_relation ...

also, rather than using simply "choice_list" and "choice" how about "membership_choice_list" and "membership_choice" to distinguish it from the other kinds of choices?  Hence:

choice_expression ::=
choice_relation {logical_operator choice_relation}

choice_relation ::=
simple_expression [relational_operator] simple_expression

relation ::= choice_relation |
simple_expression [not] in membership_choice_list

membership_choice_list ::=
membership_choice { '|' membership_choice }

membership_choice ::= choice_expression | range | subtype_mark

...

discrete_choice ::= choice_expression | discrete_range | OTHERS

...

expression ::= relation {logical_operator relation}

logical_operator ::= AND | AND THEN | OR | OR ELSE | XOR

****************************************************************

From: Ed Schonberg
Date: Monday, November 2, 2009  1:25 PM

Good points all. Here is updated version:

[Editor's note: This is version /03 of the AI.]

****************************************************************

From: Tucker Taft
Date: Monday, November 2, 2009  1:43 PM

The wording still talks about "choice_list" but the syntax now uses
"membership_choice_list".

****************************************************************

From: Randy Brukardt
Date: Monday, November 2, 2009  1:48 PM

I'll fix it when I post it. [And I did - ED]

****************************************************************

From: Steve Baird
Date: Tuesday, May 18, 2010  3:33 PM

Ed and I are supposed to come up with improved name resolution rules for
AI05-0158, preferably before Thursday's conference call.

Just to recap, the AI currently says

If the membership_choice_list in a membership operation has more that
one choice, then all choices must have a single common type, which is
the tested type of the operation.

and

The expected type of a choice_expression in a
membership_choice is any type.

These rules would disallow

F in 10 | X

, which seems undesirable.

Following the wording proposed earlier for conditional expressions, we could
replace the above with:

If the membership_choice_list in a membership operation has more than one choice, then the tested type of the membership operation shall be determined by the types of the choices as follows:
o if all choices have the same type, then
that type is the tested type of the membership operation;
o otherwise, if at least one choice has a
non-universal type and every other choice has either
that type or a universal type which covers
that non-universal type, then the non-universal type is the
tested type of the membership operation;

and

The expected type of a choice_expression in a membership_choice
is the tested type of the membership operation.

I think this would be an improvement, but thought may be needed for dealing with
interactions between universal_real and universal_fixed. I think the above rule
would disallow

X : Some_Fixed_Point_Type:= ... ;
begin
if X in Duration'Last / 2.0 | 1.0 then ...

and we need to decide whether a) this is a problem and b) it warrants fixing.
Incidentally, this is less of a problem for conditional expressions because the
context of a conditional expression can impose an expected type or a "shall
resolve to" type, which would then allow this case.

Note that the wording would be a bit more complex, but it would be conceptually
cleaner to lump the tested expression in with the choices in determining the
tested type of the membership operation. In the above example, we would then
consider three expressions
X , Duration'Last / 2.0 , and 1.0
instead of just the last two and the problem would (in this particular case)
vanish.

Wording note. I used the phrase "has the type" above instead of "is of the type"
because of my (perhaps mistaken) impression that this is better when talking
about a range. If we decide that it is perfectly ok to say that "X .. Y is of
the type T", then the wording should be be changed to be consistent with the
conditional expressions proposal. I understand that there is no problem with
"the type of X .. Y is T" - I'm just not 100% sure that it therefore follows
that it is ok to say that "X .. Y is of the type T" because that sounds (perhaps
just to me) as if it is saying that "X .. Y is a value of type T", which is
incorrect.

P.S. There may be a font issue with the current version of the AI.
When it says,

membership_choice ::= choice_expression | range | subtype_mark

, I see "range" displayed as a reserved word rather than as a nonterminal.

****************************************************************

From: Gary Dismukes
Date: Tuesday, May 18, 2010  6:22 PM

> These rules would disallow
>
>     F in 10 | X
>
> , which seems undesirable.

Agreed that such cases should certainly be allowed.

This seems closely analogous with membership tests with ranges (F in 10 .. X), so it seems like we should be able to use similar resolution rules for these new guys.  However, after looking at the rules, I'm scratching my head about how the existing wordin
g works for such membership tests.  The rules are based on the "type of the range", but it's unclear how that gets determined in "mixed-type" cases like the above.  (Relevant paragraphs are 4.5.2(3/2) and 3.5(4-5).)

BTW, I discussed this with Steve, and it's similarly unclear to him, but maybe someone else can shed some light.

> Following the wording proposed earlier for conditional expressions, we
> could replace the above with:
>
> If the membership_choice_list in a membership operation has more than
> one choice, then the tested type of the membership operation shall be
> determined by the types of the choices as follows:
>    o if all choices have the same type, then
>      that type is the tested type of the membership operation;
>    o otherwise, if at least one choice has a
>      non-universal type and every other choice has either
>      that type or a universal type which covers
>      that non-universal type, then the non-universal type is the
>      tested type of the membership operation;
>
> and
>
>     The expected type of a choice_expression in a membership_choice
>     is the tested type of the membership operation.

Maybe OK, but it would be nice to have something simpler.

> I think this would be an improvement, but thought may be needed for
> dealing with interactions between universal_real and universal_fixed.
> I think the above rule would disallow
>
>     X : Some_Fixed_Point_Type:= ... ;
>   begin
>     if X in Duration'Last / 2.0 | 1.0 then ...
>
> and we need to decide whether a) this is a problem and b) it warrants
> fixing. Incidentally, this is less of a problem for conditional
> expressions because the context of a conditional expression can impose
> an expected type or a "shall resolve to" type, which would then allow
> this case.

I can't get too excited about that kind of case.

> Note that the wording would be a bit more complex, but it would be
> conceptually cleaner to lump the tested expression in with the choices
> in determining the tested type of the membership operation.
> In the above example, we would then consider three expressions
>     X , Duration'Last / 2.0 , and 1.0
> instead of just the last two and the problem would (in this particular
> case) vanish.

Again, it would be nice if we could adapt the resolution rules used for range
membership tests (assuming those aren't broken:-).

****************************************************************

From: Steve Baird
Date: Tuesday, May 18, 2010  6:31 PM

> Again, it would be nice if we could adapt the resolution rules used
> for range membership tests (assuming those aren't broken:-).

I agree. If we can get a better understanding of how existing range membership
tests work (and if, as you mentioned, those rules aren't broken), then we should
definitely take another look at the rules for the new forms of membership tests
with that understanding in mind.

****************************************************************

From: Randy Brukardt
Date: Tuesday, May 18, 2010  8:01 PM

Range memberships are the only three-way resolution in Ada, where there are
three expressions that must have the same type but there is no information from
the context as to what types are required. (Parameters differ in this aspect,
because the allowed types for parameters come from the possible interpretations
of the subprogram, even for procedures.) This was a horrible mess to implement
in our strictly top-down resolution scheme.

This must be required by the ACATS, because there is no way we would have done
that mess without some sort of requirement.

How the wording implies that is required might indeed be a mystery; this could
very well be a place where the rules of the language are determined by the ACATS
and not the wording.

****************************************************************

From: Tucker Taft
Date: Tuesday, May 18, 2010  8:40 PM

> This seems closely analogous with membership tests with ranges (F in
> 10 .. X), so it seems like we should be able to use similar resolution
> rules for these new guys.  However, after looking at the rules, I'm
> scratching my head about how the existing wording works for such
> membership tests.  The rules are based on the "type of the range", but
> it's unclear how that gets determined in "mixed-type" cases like the
> above.  (Relevant paragraphs are 4.5.2(3/2) and 3.5(4-5).)
>
> BTW, I discussed this with Steve, and it's similarly unclear to him,
> but maybe someone else can shed some light.

There is no requirement that we state how the types in name resolution get
determined.  If you wish you try every possible type, and if you find one that
satisfies all of the name resolution rules, then that is an acceptable
interpretation.

I agree we may be able to use similar wording as that which is used with ranges.
Essentially we presume there is a "type of the range" and we say that the
expected type for each bound of the range is that type.  If there are multiple
types for which that works, then you got trouble, unless root integer or root
real is one of them, in which case that one is preferred.

One significant difference is that ranges are only for scalar types.  This new
feature is proposed for composite types.  But I don't think that really matters.
The resolution rules for "X" | "Y" are no worse than 'X' .. 'Y' because in both
cases, there are multiple types in package Standard which would match both of
these, and so they are clearly ambiguous. And we want F | "X" to work as well as
F .. 'X', so again, making the rules similar to ranges seems like a good thing.

We might have to say something about "null in new X | new Y"
but that seems like the only one that could possibly cause a real problem.
Perhaps we should simply disallow allocators in this construct.  There seems no
reason to provide a preference for universal access, since no one in their right
mind has a strong desire to write:

null in null | null

We can simply declare that ambiguous (or True) and be done with it.

>> Following the wording proposed earlier for conditional expressions,
>> we could replace the above with:
>>
>> If the membership_choice_list in a membership operation has more than
>> one choice, then the tested type of the membership operation shall be
>> determined by the types of the choices as follows:
>>    o if all choices have the same type, then
>>      that type is the tested type of the membership operation;
>>    o otherwise, if at least one choice has a
>>      non-universal type and every other choice has either
>>      that type or a universal type which covers
>>      that non-universal type, then the non-universal type is the
>>      tested type of the membership operation;
>>
>> and
>>
>>     The expected type of a choice_expression in a membership_choice
>>     is the tested type of the membership operation.
>
> Maybe OK, but it would be nice to have something simpler.

I think all we need to say is that last part, namely

The expected type for each choice_expression in
a membership_choice_list is the tested type of the
membership expression.

We don't need to tell the implementation how to figure out the tested type, in
the same way we don't tell them how to figure out the type of a range. We can
simply say that a membership test has a tested type. If there is more than one
type that satisfies the name resolution rules that govern the tested type, then
it's ambiguous, unless one of the interpretations is a root numeric type, which
is preferred.

****************************************************************

From: Randy Brukardt
Date: Tuesday, May 18, 2010  3:33 PM

...
> I think all we need to say is that last part, namely
>
>     The expected type for each choice_expression in
>     a membership_choice_list is the tested type of the
>     membership expression.
>
> We don't need to tell the implementation how to figure out the tested
> type, in the same way we don't tell them how to figure out the type of
> a range. We can simply say that a membership test has a tested type.
> If there is more than one type that satisfies the name resolution
> rules that govern the tested type, then it's ambiguous, unless one of
> the interpretations is a root numeric type, which is preferred.

That would meaning requiring the ability to match an infinite number of possible
types in any combination in cases with no context. Ada currently only requires
three such expressions (in the X in Y .. Z case), and the expressions have to be
scalar. That took 400+ lines of code in Janus/Ada alone; I don't see any sane
way to generalize it. (The problem cases are things like "10 in X .. Y", where
there is little useful type information from the "tested expression".)

Even ignoring the implementation burden (which is likely to vary for other
compilers), cases with many matching expressions is an invitation to write
tricky code. It's possible to write expressions that resolve only because a
single routine in a set of heavily overloaded ones is omitted. That doesn't seem
necessary to make the goals of memberships possible.

I would be against making resolution even more complicated just for this
specific unusual case. I'd suggest making the tested expression a single type
context, except that is neither compatible nor quite general enough (it would be
consistent with attributes, which is the most similar other cases in the
language). But something on that line (where the type is at least heavily
restricted based on the tested expression) would seem to be more sensible than
the current rules (and I'd suggest we consider extending that to all versions of
the membership test).

****************************************************************

From: Tucker Taft
Date: Tuesday, May 18, 2010  9:29 PM

I wonder why this is significantly harder than:

if 10 = X + Y + Z + 3 + 77 + G + Q + R then ...

These all have to be of the same integer type, and every one of these might be

What makes the following significantly harder?

if 10 in X | Y | Z | 3 | 77 | G | Q | R then ...

I realize that "=" and "+" need to be directly visible, but does that really
simplify the first one that much?  We can eliminate that difference with:

if A and then B and then C and then D then

where A, B, C, and D all need to be of the same boolean type.  It seems like
there are many constructs where you have more than two operands that all need to
be of the same type.  Perhaps the key thing is to think of "|" as a binary
operator rather than as an n-ary operator in your resolution algorithm.  Perhaps
that would make the resolution less of a special case.

Keeping the description of the resolution rules relatively simple and consistent
with other rules in similar contexts, even if they imply a bit more work on the
implementor's behalf, is better for the language.

****************************************************************

From: Edmond Schonberg
Date: Tuesday, May 18, 2010  9:40 PM

I was away from the office today, and missed this discussion.  As far as I can
see, there are no implementation problems at all with the proposed rules, all of
this falls out of the existing resolution machinery in GNAT. I agree with Tuck
that the primary concern should be simplicity of description, there is nothing
fundamentally new here concerning overload resolution.

****************************************************************

From: Randy Brukardt
Date: Tuesday, May 18, 2010  10:19 PM

> I wonder why this is significantly harder than:
>
>     if 10 = X + Y + Z + 3 + 77 + G + Q + R then ...
>
> These all have to be of the same integer type, and every one of these
>
> What makes the following significantly harder?
>
>      if 10 in X | Y | Z | 3 | 77 | G | Q | R then ...
>
> I realize that "=" and "+" need to be directly visible, but does that
> really simplify the first one that much?

Yes. In the first case, there is the context of "any Boolean", which is then
used to prune the set of possible "=" operators. Then each "=" operator is used
(in turn) to resolve the parameters. In this case, the "10" doesn't cause many
to fail, so there are a lot of attempts on the RHS. But it is just repeating the
same thing over and over. Moreover, in practice it doesn't ever get bad (other
than in ACATS-type tests). We didn't believe this in the early days of our
compiler, and added various things to avoid redoing work; but eventually I found
that the slowness we were trying to eliminate came almost solely from the
various techniques to avoid redoing work -- less than 1% of work ever was
redone, and it wasn't worth any overhead to avoid that.

In the second case, we have just air to start with; there is no information at
all about what to resolve to. The only way that I can see to resolve this with
the current technology would be a create a list of all possible types and try
each of them in turn. That would be several orders of magnitude more than any
operator (even "=" rarely has more than 20-30 copies visible). And I wonder
whether that could handle anonymous types, classwide types, and the like
properly.

> We can
> eliminate that difference with:
>
>      if A and then B and then C and then D then
>
> where A, B, C, and D all need to be of the same boolean type.

I don't see anything eliminated here. "and then" is a always visible Boolean
operator for the purposes of resolution. In fact, it is that for *all* purposes
within Janus/Ada: it's just an operator. They're even declared in the
symboltable (with names of "At" and "Others" so that nothing can hit them in a
lookup). Since we chain all of the declarations with the same defining name, we
can get the list of them trivially, and that gives us a small list of possibles
to resolve to.

>  It seems like there are many constructs where you have more than two
> operands that all need to be of the same type.
> Perhaps the key thing is to think of "|" as a binary operator rather
> than as an n-ary operator in your resolution algorithm.  Perhaps that
> would make the resolution less of a special case.

The problem is not the operator, the problem is the lack of context. Even
treating this as a operator, it would have to exist for all types in the closure
of a program (you can use a membership on a type that you don't have visibility
on), so the number is gigantic. If you're able to do that, you could just do
that in the first place -- but I suspect that anonymous types and classwide
types would cause problems.

> Keeping the description of the resolution rules relatively simple and
> consistent with other rules in similar contexts, even if they imply a
> bit more work on the implementor's behalf, is better for the language.

I agree. The most similar context (which is to say *no* context) to a membership
is the prefix of an attribute. It's the current rules for memberships (or lack
of rules) that is the anomoly. It's essentially the only context in Ada where
there is no information from the context *and* there is no requirement for a
"single type". That makes it very unusual.

In any case, I don't think that the impossibility of implementing this in
Janus/Ada is that big of a deal for the language. It's the possibility to write
incredibly tricky expressions that really bothers me. Do we really want:

if F1 in F2 | F3 | F4 | F5 then

to resolve if F1 is defined for Integer/Float/Character/String, F2 for
Integer/Float/Duration/String, F3 for Float/Character/Duration/String, F4 for
Integer/Float/Duration/Character, and F5 for Integer/Float/Character/Duration??

I've already decided I hate this proposal because it makes it impossible to ever
add proper set constraints to the language. (Such constraints need to be
discrete and static in order to be able to affect the value set in sensible
ways, there is no way to insist on such requirements after adding this syntax
for memberships.) So maybe my opinion doesn't matter anymore.

****************************************************************

From: Randy Brukardt
Date: Tuesday, May 18, 2010  10:23 PM

> I was away from the office today, and missed this discussion.
>  As far as I can see, there are no implementation problems at all with
> the proposed rules, all of this falls out of the existing resolution
> machinery in GNAT. I agree with Tuck that the primary concern should
> be simplicity of description, there is nothing fundamentally new here

I agree there is nothing new: membership is already broken, and you guys want to
make it *more* broken. There are no other cases like it in Ada now, and the harm
to understandability that the amazingly complex resolution brings is all out of
proportion to the value of its use (which is simple subtypes and ranges made up
of literals). These complex cases never happen in real use, so why burden both
implementations *and* readers with the possibility??

****************************************************************

From: Tucker Taft
Date: Tuesday, May 18, 2010  10:30 PM

so we never need to create a list of
possible types.  That may be why this
kind of thing doesn't add any significant complexity to our overload resolution.

****************************************************************

From: Randy Brukardt
Date: Tuesday, May 18, 2010  10:37 PM

> I agree there is nothing new: membership is already broken, and you
> guys want to make it *more* broken. There are no other cases like it
> in Ada now, and the harm to understandability that the amazingly
> complex resolution brings is all out of proportion to the value of its
> use (which is simple subtypes and ranges made up of literals).
> These complex cases never happen in real use, so why burden both
> implementations *and* readers with the possibility??

Let me be more constructive. The problem here is that the rules assume that
there is a way to determine the tested type from the part of the membership
following the "in", when that is not actually true.

Perhaps we should fix this for this new construct by adding the name of a type:

E1 in Integer when E2 | E2

Then E2 and E3 have to resolve to Integer, and finally E1 has to resolve to
Integer. We no longer have to guess the type to resolve to.

It would be too much to ask to require this for existing memberships (although
it would be a good idea).

****************************************************************

From: Edmond Schonberg
Date: Wednesday, May 19, 2010  8:17 AM

Wouldn't it be simpler to  just require the left operand to be a complete
context? simple to state and implement,.

****************************************************************

From: Randy Brukardt
Date: Wednesday, May 19, 2010  12:51 PM

I thought about that, but if applied to memberships in general, it would prevent
something like
10 in Some_Subtype
which seems useful and has no resolution issues as the RHS provides a unique
(sub)type.

I also wonder what that would do to the new uses of memberships for
accessibility checking (as anonymous access types always need to be converted to
something). This latter I'd have to think about some more, by going back to that
AI and looking at the examples.

Doing it just for the new memberships would probably be fine, but slightly
inconsistent.

The existing wording for resolution of memberships implies that the RHS provides
a type; if we could preserve that we wouldn't need to make any other
incompatible change or add an inconsistency.

****************************************************************

From: Tucker Taft
Date: Wednesday, May 19, 2010  1:30 PM

That's most useful with named numbers.
And those resolve just fine to universal integer.
For compatibility, if we have a single membership choice, then we should stick
with the current rules where the type comes from the RHS (at least officially).
But if there is more than one choice, we could say the type comes from the LHS,
and it can't be one of those things that requires context, like a string literal
or a character literal or an allocator.

That doesn't seem too nasty.

****************************************************************

From: Bob Duff
Date: Wednesday, May 19, 2010  7:39 PM

In ARG e-mails, we should use AI numbers AND titles in the subject line.  Please
-- I'm too old to remember what AI-158 means.  ;-) Although I can guess from the
discussion content...

Tucker Taft wrote:

> There is no requirement that we state how the types in name resolution
> get determined. ...

Right.  We (language designers) just give a set of simultaneous equations in N
variables, and let the implementers figure out how to solve them. AARM hints (as
we have in the case of assignment_statement) are nice.

Edmond Schonberg wrote:

> I was away from the office today, and missed this discussion.

I was NOT out of the office, at least not any more than usual (i.e. I am ALWAYS
out of THE office), but I deliberately ignored the discussion until now.  ;-)

>...As far
> as I can see, there are no implementation problems at all with the
>proposed rules, all of this falls out of the existing resolution
>machinery in GNAT. I agree with Tuck that the primary concern should
>be simplicity of description, there is nothing fundamentally new here

I agree.

On the other hand, I'm not entirely opposed to Randy's view, that having type
information flow right-to-left in the syntax tree is not really helpful in terms

Randy Brukardt wrote:

> Perhaps we should fix this for this new construct by adding the name
> of a
> type:
>
>             E1 in Integer when E2 | E2
>
> Then E2 and E3 have to resolve to Integer, and finally E1 has to
> resolve to Integer. We no longer have to guess the type to resolve to.

But this is just:

Integer'(E1) in E2 | E2

Whatever the resolution rules, we already have syntax to nail down the type of
E1.

Randy Brukardt wrote:

> In any case, I don't think that the impossibility of implementing this
> in Janus/Ada is that big of a deal for the language.

I think that's a key (political) point!

Difficulties for Janus/Ada are interesting as a data point, but we should not
outlaw new features simply because of difficulties for Janus/Ada. Sorry, Randy,
but I think that makes sense at this point.

On the other hand:

>...It's the possibility to
> write incredibly tricky expressions that really bothers me. Do we
>really
> want:
>
>    if F1 in F2 | F3 | F4 | F5 then
>
> to resolve if F1 is defined for Integer/Float/Character/String, F2 for
> Integer/Float/Duration/String, F3 for Float/Character/Duration/String,
> F4 for Integer/Float/Duration/Character, and F5 for
> Integer/Float/Character/Duration??

I'm generally in agreement with Randy's discomfort with this sort of thing,
purely from a readability point of view (nothing to do with implementation
difficulties).

****************************************************************

From: Edmond Schonberg
Date: Wednesday, May 19, 2010  7:53 PM

> I'm generally in agreement with Randy's discomfort with this sort of
> thing, purely from a readability point of view (nothing to do with
> implementation difficulties).

One can feel plenty of discomfort with this kind of code, but

a) no one writes this sort of thing (expect eager ACATS designers :-)!

b) It should just fall out of a proper resolution machinery. We have known since
1982 that the proper resolution algorithm needs two passes: one upwards to
recall the authors of the paper right now, but the point is that the set of
simultaneous equations that Bob mentions can be shown to be solved by just these
two passes, and not require some arbitrary iteration, which is rather nice.

c) without some notion of set intersection I have no idea how one can
resolve an assignment statement   F (X) := G (X);

when both sides are overloaded. There are two sets of candidate interpretations
for F and G, and you need to find the (hopefully unique) common type between the
two sets.

Once you have that, the resolution of
if F1 in F2 | F3 | F4 | F5 then

proceeds by performing these intersections repeatedly to find the common
interpretation, if any.  In practice of course the alternatives will be
constants of a some enumeration type and the heavy machinery will not be used.
In any case, there are no implementation difficulties.

****************************************************************

From: Randy Brukardt
Date: Wednesday, May 19, 2010  8:37 PM

> b) It should just fall out of a proper resolution machinery.
> We have known since 1982 that the proper resolution algorithm needs
> two
> passes: one upwards to collect interpretations, and one downwards to
> resolve overloading.  I don't recall the authors of the paper right
> now, but the point is that the set of simultaneous equations that Bob
> mentions can be shown to be solved by just these two passes, and not
> require some arbitrary iteration, which is rather nice.

Whoa! "Proper resolution algorithm"?? What on earth are you talking about? I
recall such a paper back in the dark ages, which was really hard to understand
and also used way to much memory for our 56K CP/M systems. It simply wasn't
practical.

In any case, it also turns out that all of this concern about efficiency of
resolution is misplaced. On real programs, it simply doesn't matter how it is
done. The only problem we ever had with our recursive resolution was with an
ACATS test that contained 50 concatenation operators, and the solution was to
resolve the parameters in the reverse order. (You could still trigger a problem
with an expression with 40+ levels of parentheses and heavily overloaded
operators like "&", but I've yet to see such an expression in practice or in the
ACATS.)

> c) without some notion of set intersection I have no idea how one can
> resolve an assignment statement   F (X) := G (X);
>
> when both sides are overloaded. There are two sets of candidate
> interpretations for F and G, and you need to find the (hopefully
> unique) common type between the two sets.

You need more imagination. :-)

Since the LHS of an assignment is always into a Name of some sort, we start with
the list of solutions for that Name (such a list has to exist, and it is set up
before we start any resolution), and then try resolving the RHS for each of the
types determined by the solutions (usually, there is only one, of course). There
must be exactly one that works (well, it's bit more complex than that because of
the preference rules for root_integer/root_real).

That doesn't extend well because in other cases we cannot assume the expression
is a Name. Nor do we have any way to hold onto any "set" of results; all we are
keeping track of is whether exactly one solution has been found and the type of
that solution.

> Once you have that, the resolution of
>      if F1 in F2 | F3 | F4 | F5 then
>
> proceeds by performing these intersections repeatedly to find the
> common interpretation, if any.  In practice of course the alternatives
> will be constants of a some enumeration type and the heavy machinery
> will not be used. In any case, there are no implementation
> difficulties.

As I have repeatedly said, this is not true. There is never a "set" of results
to intersect in any place in Janus/Ada; it's all done with code and scalar local
variables. In early versions of Janus/Ada (up until 32-bit versions, not widely
used until the mid 90s, long after all of the basic parts of our compiler were
designed), heap space was limited to 64K and thus was at an extreme premium,
thus we *always* used code rather than data structures if that was possible.
Sets of arbitrary items are expensive in that context; we only built them when
absolutely necessary (the set of possible solutions was one such case, and it
was a major problem in terms of memory usage and heap fragmentation).

There is a later compiler phase that walks the tree to remove unused
interpretations and enforce legality rules; it has nothing to do with
resolution.

In any case, we're clearly talking past each other. I'm not going to support
this with general resolution, period, both because I think the potential cost in
readability is dangerous and unnecessary, and because I cannot in good
conscience vote for anything that is against the interests of RRS (at least
without significant user benefit, and this resolution rule has none). Further
discussion of this sort is futile.

****************************************************************

```

Questions? Ask the ACAA Technical Agent