!standard 3.8.1(5) 10-01-20 AI05-0158-1/04 !standard 4.4(3) !standard 4.5.2(3) !standard 4.5.2(5) !standard 4.5.2(27) !class Amendment 09-06-07 !status work item 09-06-07 !status received 09-03-30 !priority Low !difficulty Medium !subject Generalizing membership tests !summary Extend the syntax of membership tests to simplify complex conditions that can be expressed as membership in a subset of values of any type. Introduce syntax for a list of expressions that may be used in loop contexts as well. !problem Conditions of the form (X = A) or else (X = B) or else (X = C) where A, B, C are of some arbitrary type are common, and will be more frequent when pre- and post-conditions become widely used. If the values of A, B,and C are contiguous values of some discrete type, the expression can be written as a membership operation: (X in A..C) . Otherwise this syntactic shortcut is not available. We propose a simple extension of the syntax of membership operations, so that the right operand can specify a subset of values of some type. The type does not need to be discrete, and the choices do not need to be static, so that one can write: if C not in 'A' | 'B' | 'O' then Put_Line ("invalid blood type"); else .. while Name (1 .. N) in T.Element (X) | T.Element (Y) | T.Element (Z) loop ... !proposal We propose a simple extension of the syntax of membership operations, so that the right operand can specify a subset of values of some type. We propose a simple extension of the syntax of membership operations, so that the right operand can specify a subset of values of some type. The type does not need to be discrete, and the choices do not need to be static, so that one can write: if C not in 'A' | 'B' | 'O' then Put_Line ("invalid blood type"); else .. while Name (1 .. N) in T.Element (X) | T.Element (Y) | T.Element (Z) loop ... !wording Modify 3.8.1 (5) as follows: discrete_Choice ::= choice_expression | discrete_Range | others Modify 4.4(3) as follows: expression := relation {logical_operator relation} choice_expression := choice_relation {logical_operator choice_relation} choice_relation ::= simple_expression [relational_operator simple_expression] relation ::= simple_expression [not] in membership_choice_list membership_choice_list ::= membership_choice {'|' membership_choice} membership_choice ::= choice_expression | range | subtype_mark logical_operator ::= and | and then | or | or else | xor Add after 4.5.2(3): (Name Resolution Rules) If the membership_choice_list in a membership operation has more that one choice, then all choices must have a single common type, which is the tested type of the operation. Add after 4.5.2 (5) (Static semantics) The expected type of a choice_expression in a membership_choice is any type. Modify 4.5.2(27) as follows: If the choice_list has a single choice, the simple expression and the choice are elaborated in arbitrary order. If the choice_list has more that one choice, the left operand is elaborated first and the operation is equivalent to a short-circuit sequence of individual tests on each choice. Modify 4.5.2 (28) as follows: A membership test using in yields the result True if any of the individual tests on the elements of the choice list yields True. An individual membership test yields the result True if: * The choice is an expression, and the simple_expression is equal to the value of the choice. If the type of the expression is a record type or a limited type, the test uses the primitive equality for the type; otherwise the test uses predefined equality. * The choice is a range or subtype mark and: (followed by para. 29-30) !discussion The new syntax creates an upward incompatibility: a choice cannot be an expression, as in Ada2005, but is limited to an expression without a membership operator. This is likely to be harmless, and seems to be preferable to making the grammar ambiguous, as shown in the following example, in which a choice list can contain an arbitrary expression: X : Boolean := .. case X is when Y in 1..10 | 20 => -- ambiguous: -- could be (Y in 1..10) | 20 -- which is otherwise type-incorrect Even though this is a contrived example, unlikely to show up in realistic code, it seems preferable to modify the grammar as shown above. There was a suggestion that the type of an expression in a choice should be non-limited. However, given that currently a subtype_mark in a choice can be of any type, it would be awkward to introduce this restriction. Clearly if the type has no primitive equality a membership test over a list of expressions is illegal. [Editor's note: This decision brings visibility into the legality of membership operations. That seems like a lousy choice (pun intended). Personally, I would prefer the inconsistency. (Well, I'd prefer AI-153-2 instead of this one, which would require these to be discrete only, but make them work *everywhere*. But that's a separate issue.)] !example (See proposal.) !ACATS test Add an ACATS test for this new feature. !appendix From: Robert Dewar Date: Monday, March 30, 2009 8:02 PM How about allowing Pascal style sets while we are at it? pragma Precondition (xyz(3) = 'w' or else xyz(3) = 'x' or else xyz(3) = 'z'); is pretty horrible, now if only we had y instead of z we could write (xyz(3) in 'w' .. 'y') but suddenly for one missing element in the sequence we are driven to the top form How about allowing (xyz(3) in ('w','x','z')) easy to implement, easy to read, and very convenient I would also allow e.g. (xyz(3) in ('d'..'f', 'm', 'r'..'s')) In the case of the GNAT code, we used to have Nkind (X) = N_Node1 or else Nkind (X) = N_Node2 and we defined Nkind_In (X, N_Node1, Node_2) we have nine variants for 2 to 9 arguments, but really it would be much nicer to say Nkind (X) in (N_Node1, N_Node2) I thought of {} instead of (), but I think () is probably more consistent with Ada syntax style. I would allow these forms ONLY in membership tests, and the meaning of Expr in (A, B, C) Expr in (A) or else Expr in (B) or else Expr in (C) if A is an expression Expr in (A) means Expr = A if A is a range or subtype name Epxr in (A) means Expr in A notin by obvious analogy. It is a little tempting to allow them in loops as well for J in (A, B, C) loop but I would settle for just the membership tests **************************************************************** From: Tucker Taft Date: Monday, March 30, 2009 9:09 PM I would prefer to use the "|" symbol, since that is what is already used in choice lists. E.g.: Xyz(3) in ('w' .. 'x' | 'z') Not sure whether the parens should be required. **************************************************************** From: Robert Dewar Date: Monday, March 30, 2009 9:21 PM I am cool with | instead of , And yes, if you go with | instead of , you could indeed leave out the parens. I sort of like the parens (I think of them as {} :-)) But I can go either way. Can I presume that the fact that you are micro-analyzing the syntax means you agree with the basic idea? :-) **************************************************************** From: Tucker Taft Date: Monday, March 30, 2009 9:43 PM I have always been annoyed that I can use multiple choice ranges in a case alternative, but not in other similar contexts, such as membership tests (and possibly "for" loops). I frequently find myself turning what should be a simple membership test into a case statement merely to get the multiple-choice-range feature. So yes, I support using "|" in more contexts. And I agree that preconditions and postconditions will be heavy users of membership tests, and it will be annoying if you can only conveniently talk about a single contiguous range of values. **************************************************************** From: Randy Brukardt Date: Monday, March 30, 2009 10:01 PM > How about allowing > > (xyz(3) in ('w','x','z')) > > easy to implement, easy to read, and very convenient Humm, the devil is in the details here. For what types are these memberships supposed to work? Can they be static? Etc. If they're only for static discrete types (as in case statements, based on Tucker's responses), then I surely agree that they're easy to implement. I'm not so sure in other cases. **************************************************************** From: Robert Dewar Date: Monday, March 30, 2009 10:11 PM > If they're only for static discrete types (as in case statements, > based on Tucker's responses), then I surely agree that they're easy to > implement. I'm not so sure in other cases. I gave a semantic equivalence, and no, it is not acceptable to limit these to static values, but of course they are limited to discrete types, that goes without saying. **************************************************************** From: Tucker Taft Date: Monday, March 30, 2009 10:04 PM If we don't want to require parentheses in a membership test, then we would have to change discrete choice to use a simple_expression rather than an expression: discrete_choice ::= simple_expression | discrete_range | others and then we could change membership test to be: simple_expression [not] in choice_list where (non-discrete) choice and choice_list are: choice ::= simple_expression | range | subtype_mark choice_list ::= choice { '|' choice } **************************************************************** From: Robert Dewar Date: Monday, March 30, 2009 10:17 PM If there are no parens, don't we have confusion in case x is when a in b | c could be when (a in b) | c or when a in (b | c) **************************************************************** From: Tucker Taft Date: Monday, March 30, 2009 10:34 PM Yes, unless we change discrete_choice to use "simple_expression" rather than "expression" which doesn't sound like such a bad idea in any case. **************************************************************** From: Robert Dewar Date: Tuesday, March 31, 2009 8:43 AM Isn't that a clear upwards incompatibility? **************************************************************** From: Tucker Taft Date: Tuesday, March 31, 2009 11:29 AM Yes, it is an upward incompatibility, though probably a reasonable one. discrete_choice is used for case alternatives, variant alternatives, and array aggregates. What you would lose is the ability to use relationals and membership tests in such contexts, presuming the case expression, discriminant, or array index is of a boolean type. Of course these contexts allow no overlap in choices, and the case and variant alternatives require the choice to be static, so trying to figure how to use a membership test or a relational in a discrete_choice would be interesting. E.g.: case X is when A < B => ... when others => ... end case; or Y := (A in 1..10 => 42, others => 43); It is a bit hard to imagine how non-simple_expressions would appear in a legitimate and legal program. Adding parentheses would probably help such programs, should they exist. Note that the suggested change would be upward "consistent," in that the compiler would always catch the problem. **************************************************************** From: Bob Duff Date: Tuesday, March 31, 2009 11:59 AM > If we don't want to require parentheses in a membership test, then we > would have to change discrete choice to use a simple_expression rather > than an expression: > > discrete_choice ::= > simple_expression | discrete_range | others Interestingly, this restriction was present in the Ada 83 syntax, and was deliberately removed. See AARM-4.4: Extensions to Ada 83 ... 15.b In various contexts throughout the language where Ada 83 syntax rules had simple_expression, the corresponding Ada 95 syntax rule has expression instead. This reflects the inclusion of modular integer types, which makes the logical operators "and", "or", and "xor" more useful in expressions of an integer type. Requiring parentheses to use these operators in such contexts seemed unnecessary and potentially confusing. Note that the bounds of a range still have to be specified by simple_expressions, since otherwise expressions involving membership tests might be ambiguous. Essentially, the operation ".." is of higher precedence than the logical operators, and hence uses of logical operators still have to be parenthesized when used in a bound of a range. And AARM-3.8.1: Wording Changes from Ada 83 29.b The syntactic category choice is removed. The syntax rules for variant, array_aggregate, and case_statement now use discrete_choice_list or discrete_choice instead. The syntax rule for record_aggregate now defines its own syntax for named associations. The Ada 83 syntax is: choice ::= simple_expression | discrete_range | others | component_simple_name **************************************************************** From: Tucker Taft Date: Tuesday, March 31, 2009 12:40 PM Yes, I remember us making that change, simply for uniformity, as there was no particular reason for the restriction. Now that we have a reason, well that changes everything! **************************************************************** From: Robert Dewar Date: Tuesday, March 31, 2009 3:25 PM All these interesting comments on the syntax, but what do you think of the basic idea? :-) **************************************************************** From: Bob Duff Date: Tuesday, March 31, 2009 3:39 PM I haven't decided yet. More later... **************************************************************** From: Ed Schonberg Date: Friday, April 24, 2009 8:28 AM Going back to the syntax of set comprehensions: the last proposal was to leave out parentheses and revert to the Ada83 rule that only simple expressions can occur is discrete choices. The fact that this was an incompatibility was dismissed as "probably harmless". We just came up with an instance of a relation in a case statement : an OR of assorted constants in a modular context. An old lesson: any upwards incompatibility is suspicious. So I would prefer to see set comprehensions as parenthesized, |-separated lists, with their own production. For now they can only appear in membership tests, but having an explicit name for them will make it easier at a later date to introduce them in loops, as was suggested. From there to treating them as expressions and having proper set algebra is but an easy step! **************************************************************** From: Robert Dewar Date: Friday, April 24, 2009 9:41 AM Comments 1. I much prefer the alternatives syntax, since we already have it in the language, and it reads very nicely. I do not see these as set comprehensions, but rather just an abbreviation in conditionals, similar to COBOL ' IF A EQUALS 2 OR 3 OR 17 THEN and the equivalent Ada if A in 2 | 3 | 17 then seems nice. 2. I am dubious about allowing sets in loops, but even if we go that far, the | syntax is still natural (and indeed since the keyword is IN, is a smooth extension). 3. I do not like Ed's "easy step", it sounds like a mess to me from all points of view (informal definition, formal definition including type resolution, and implementation), and is simply too large a step, too much outside what Ada is as a language for my taste. 4. Regarding the issue of upwards incompatibility, we have three choices a) Just go back to the Ada 83 syntax. As Ed points out, this did cause an incompatibility in our run time library, where we had when A or B or C => and we have not tried our big test suite with this change (I will try that run at some point). If we decide on doing this, I would suggest that we immediately issue a non-binding interpretation saying that we made a mistake in Ada 95 and Ada 20xx and that we allow only simple expressions in choices, allowing compilers to immediately implement the change now, and avoiding the accumulation of more incompatibility in existing code. b) Just agree to resolve the potentially ambiguous forms in the Ada 95 manner. Messy from a formal point of view, but fine in practice because only strange test programs will contain things like: case Some_Boolean is when A in B | C | D => bla bla c) Muck with the syntax to simply exclude the use of IN or NOT IN in this context, i.e. we introduce a new class of expression, e.g, RRESTRICTED_EXPRESSION which allows unparenthesized use of logical operators but does not allow unparenthesized use of IN or NOT IN. Fairly easy to implement, a bit awkward to define, but not bad, e.g. rexpression ::= rrelation {and rrelation} | rrelation {and then rrelation} | rrelation {or rrelation} | rrelation {or else rrelation} | rrelation {xor rrelation} rrelation ::= simple_expression [relational_operator simple_expression] Here rexpression = restricted_expression and rrelation = restricted_relation (just did not feel like that much typing). From a users point of view, b) or c) are far preferable to a). **************************************************************** From: Ed Schonberg Date: Friday, April 24, 2009 10:41 AM or d) require a parenthesized list. Upwards compatible. Clean and orthogonal. **************************************************************** From: Robert Dewar Date: Friday, April 24, 2009 10:52 AM But my list is predicated on keeping the | syntax. Yes, the parenthesis notation avoids compatibility issues, but I think that is its only advantage. Remember I originally suggested the parentheses, but as soon as Tuck suggested use of the existing choices syntax, that made so much more sense to me, because the usage if A in 2 | 3 .. 6 | 9 then is so nicely parallel to case 2 | 3 .. 6 | 9 => in both cases, we have an implied OR between equality conditions. Anyway, let's see if this discussion can inspire some other ARG members to give their opinions. **************************************************************** From: Robert Dewar Date: Friday, April 24, 2009 12:08 PM I ran a test of our test suite requiring Simple_Expression for choices, number of problems detected = ZERO, so it is only the one case in our own run-time that ran into this problem. **************************************************************** From: Randy Brukardt Date: Friday, April 24, 2009 9:19 PM ... > But my list is predicated on keeping the | syntax. Yes, the > parenthesis notation avoids compatibility issues, but I think that is > its only advantage. Remember I originally suggested the parentheses, > but as soon as Tuck suggested use of the existing choices syntax, that > made so much more sense to me, because the usage > > if A in 2 | 3 .. 6 | 9 then > > is so nicely parallel to > > case 2 | 3 .. 6 | 9 => > > in both cases, we have an implied OR between equality conditions. I agree. If we bother doing this at all, it should look as much as possible like other things already in the language. I don't like introducing an incompatibility for it, however -- it doesn't seem important enough for that. I think Robert's "restricted_expression" idea is probably best; using a membership (*any* membership) in a case statement is pretty weird. If we're going to take an incompatibility, it ought to be there and not in modular operations (which seem much more likely to come up in practice, and indeed you have an example). But sticking exactly with the Ada 95 rules would be OK, too. > Anyway, let's see if this discussion can inspire some other ARG > members to give their opinions. I've done it... **************************************************************** From: Robert Dewar Date: Saturday, April 25, 2009 8:15 AM I implemented the restricted expression approach in our experimental implementation of set membership. It took only 10 lines of code, and with this change we get zero regressions on our own sources and our regression suite (when we force this restriction on unconditionally). **************************************************************** From: Ed Schonberg Date: Sunday, November 1, 2009 10:25 AM Here is an updated version of AI05-0158 (membership tests) incorporating the changes suggested by you and Robert. The new syntax is unambiguous, and presents a minuscule incompatibility with Ada2005. [Editor's note: This is version /02 of the AI.] **************************************************************** From: Tucker Taft Date: Sunday, November 1, 2009 11:13 AM It would help to have some examples. Also, rather than using "restricted_*" where "restricted" is somewhat mysterious, how about: choice_expression ::= choice_relation ... also, rather than using simply "choice_list" and "choice" how about "membership_choice_list" and "membership_choice" to distinguish it from the other kinds of choices? Hence: choice_expression ::= choice_relation {logical_operator choice_relation} choice_relation ::= simple_expression [relational_operator] simple_expression relation ::= choice_relation | simple_expression [not] in membership_choice_list membership_choice_list ::= membership_choice { '|' membership_choice } membership_choice ::= choice_expression | range | subtype_mark ... discrete_choice ::= choice_expression | discrete_range | OTHERS ... expression ::= relation {logical_operator relation} logical_operator ::= AND | AND THEN | OR | OR ELSE | XOR **************************************************************** From: Ed Schonberg Date: Monday, November 2, 2009 1:25 PM Good points all. Here is updated version: [Editor's note: This is version /03 of the AI.] **************************************************************** From: Tucker Taft Date: Monday, November 2, 2009 1:43 PM The wording still talks about "choice_list" but the syntax now uses "membership_choice_list". **************************************************************** From: Randy Brukardt Date: Monday, November 2, 2009 1:48 PM I'll fix it when I post it. [And I did - ED] **************************************************************** From: Steve Baird Date: Tuesday, May 18, 2010 3:33 PM Ed and I are supposed to come up with improved name resolution rules for AI05-0158, preferably before Thursday's conference call. Just to recap, the AI currently says If the membership_choice_list in a membership operation has more that one choice, then all choices must have a single common type, which is the tested type of the operation. and The expected type of a choice_expression in a membership_choice is any type. These rules would disallow F in 10 | X , which seems undesirable. Following the wording proposed earlier for conditional expressions, we could replace the above with: If the membership_choice_list in a membership operation has more than one choice, then the tested type of the membership operation shall be determined by the types of the choices as follows: o if all choices have the same type, then that type is the tested type of the membership operation; o otherwise, if at least one choice has a non-universal type and every other choice has either that type or a universal type which covers that non-universal type, then the non-universal type is the tested type of the membership operation; and The expected type of a choice_expression in a membership_choice is the tested type of the membership operation. I think this would be an improvement, but thought may be needed for dealing with interactions between universal_real and universal_fixed. I think the above rule would disallow X : Some_Fixed_Point_Type:= ... ; begin if X in Duration'Last / 2.0 | 1.0 then ... and we need to decide whether a) this is a problem and b) it warrants fixing. Incidentally, this is less of a problem for conditional expressions because the context of a conditional expression can impose an expected type or a "shall resolve to" type, which would then allow this case. Note that the wording would be a bit more complex, but it would be conceptually cleaner to lump the tested expression in with the choices in determining the tested type of the membership operation. In the above example, we would then consider three expressions X , Duration'Last / 2.0 , and 1.0 instead of just the last two and the problem would (in this particular case) vanish. Wording note. I used the phrase "has the type" above instead of "is of the type" because of my (perhaps mistaken) impression that this is better when talking about a range. If we decide that it is perfectly ok to say that "X .. Y is of the type T", then the wording should be be changed to be consistent with the conditional expressions proposal. I understand that there is no problem with "the type of X .. Y is T" - I'm just not 100% sure that it therefore follows that it is ok to say that "X .. Y is of the type T" because that sounds (perhaps just to me) as if it is saying that "X .. Y is a value of type T", which is incorrect. P.S. There may be a font issue with the current version of the AI. When it says, membership_choice ::= choice_expression | range | subtype_mark , I see "range" displayed as a reserved word rather than as a nonterminal. **************************************************************** From: Gary Dismukes Date: Tuesday, May 18, 2010 6:22 PM > These rules would disallow > > F in 10 | X > > , which seems undesirable. Agreed that such cases should certainly be allowed. This seems closely analogous with membership tests with ranges (F in 10 .. X), so it seems like we should be able to use similar resolution rules for these new guys. However, after looking at the rules, I'm scratching my head about how the existing wording works for such membership tests. The rules are based on the "type of the range", but it's unclear how that gets determined in "mixed-type" cases like the above. (Relevant paragraphs are 4.5.2(3/2) and 3.5(4-5).) BTW, I discussed this with Steve, and it's similarly unclear to him, but maybe someone else can shed some light. > Following the wording proposed earlier for conditional expressions, we > could replace the above with: > > If the membership_choice_list in a membership operation has more than > one choice, then the tested type of the membership operation shall be > determined by the types of the choices as follows: > o if all choices have the same type, then > that type is the tested type of the membership operation; > o otherwise, if at least one choice has a > non-universal type and every other choice has either > that type or a universal type which covers > that non-universal type, then the non-universal type is the > tested type of the membership operation; > > and > > The expected type of a choice_expression in a membership_choice > is the tested type of the membership operation. Maybe OK, but it would be nice to have something simpler. > I think this would be an improvement, but thought may be needed for > dealing with interactions between universal_real and universal_fixed. > I think the above rule would disallow > > X : Some_Fixed_Point_Type:= ... ; > begin > if X in Duration'Last / 2.0 | 1.0 then ... > > and we need to decide whether a) this is a problem and b) it warrants > fixing. Incidentally, this is less of a problem for conditional > expressions because the context of a conditional expression can impose > an expected type or a "shall resolve to" type, which would then allow > this case. I can't get too excited about that kind of case. > Note that the wording would be a bit more complex, but it would be > conceptually cleaner to lump the tested expression in with the choices > in determining the tested type of the membership operation. > In the above example, we would then consider three expressions > X , Duration'Last / 2.0 , and 1.0 > instead of just the last two and the problem would (in this particular > case) vanish. Again, it would be nice if we could adapt the resolution rules used for range membership tests (assuming those aren't broken:-). **************************************************************** From: Steve Baird Date: Tuesday, May 18, 2010 6:31 PM > Again, it would be nice if we could adapt the resolution rules used > for range membership tests (assuming those aren't broken:-). I agree. If we can get a better understanding of how existing range membership tests work (and if, as you mentioned, those rules aren't broken), then we should definitely take another look at the rules for the new forms of membership tests with that understanding in mind. **************************************************************** From: Randy Brukardt Date: Tuesday, May 18, 2010 8:01 PM Range memberships are the only three-way resolution in Ada, where there are three expressions that must have the same type but there is no information from the context as to what types are required. (Parameters differ in this aspect, because the allowed types for parameters come from the possible interpretations of the subprogram, even for procedures.) This was a horrible mess to implement in our strictly top-down resolution scheme. This must be required by the ACATS, because there is no way we would have done that mess without some sort of requirement. How the wording implies that is required might indeed be a mystery; this could very well be a place where the rules of the language are determined by the ACATS and not the wording. **************************************************************** From: Tucker Taft Date: Tuesday, May 18, 2010 8:40 PM > This seems closely analogous with membership tests with ranges (F in > 10 .. X), so it seems like we should be able to use similar resolution > rules for these new guys. However, after looking at the rules, I'm > scratching my head about how the existing wording works for such > membership tests. The rules are based on the "type of the range", but > it's unclear how that gets determined in "mixed-type" cases like the > above. (Relevant paragraphs are 4.5.2(3/2) and 3.5(4-5).) > > BTW, I discussed this with Steve, and it's similarly unclear to him, > but maybe someone else can shed some light. There is no requirement that we state how the types in name resolution get determined. If you wish you try every possible type, and if you find one that satisfies all of the name resolution rules, then that is an acceptable interpretation. I agree we may be able to use similar wording as that which is used with ranges. Essentially we presume there is a "type of the range" and we say that the expected type for each bound of the range is that type. If there are multiple types for which that works, then you got trouble, unless root integer or root real is one of them, in which case that one is preferred. One significant difference is that ranges are only for scalar types. This new feature is proposed for composite types. But I don't think that really matters. The resolution rules for "X" | "Y" are no worse than 'X' .. 'Y' because in both cases, there are multiple types in package Standard which would match both of these, and so they are clearly ambiguous. And we want F | "X" to work as well as F .. 'X', so again, making the rules similar to ranges seems like a good thing. We might have to say something about "null in new X | new Y" but that seems like the only one that could possibly cause a real problem. Perhaps we should simply disallow allocators in this construct. There seems no reason to provide a preference for universal access, since no one in their right mind has a strong desire to write: null in null | null We can simply declare that ambiguous (or True) and be done with it. >> Following the wording proposed earlier for conditional expressions, >> we could replace the above with: >> >> If the membership_choice_list in a membership operation has more than >> one choice, then the tested type of the membership operation shall be >> determined by the types of the choices as follows: >> o if all choices have the same type, then >> that type is the tested type of the membership operation; >> o otherwise, if at least one choice has a >> non-universal type and every other choice has either >> that type or a universal type which covers >> that non-universal type, then the non-universal type is the >> tested type of the membership operation; >> >> and >> >> The expected type of a choice_expression in a membership_choice >> is the tested type of the membership operation. > > Maybe OK, but it would be nice to have something simpler. I think all we need to say is that last part, namely The expected type for each choice_expression in a membership_choice_list is the tested type of the membership expression. We don't need to tell the implementation how to figure out the tested type, in the same way we don't tell them how to figure out the type of a range. We can simply say that a membership test has a tested type. If there is more than one type that satisfies the name resolution rules that govern the tested type, then it's ambiguous, unless one of the interpretations is a root numeric type, which is preferred. **************************************************************** From: Randy Brukardt Date: Tuesday, May 18, 2010 3:33 PM ... > I think all we need to say is that last part, namely > > The expected type for each choice_expression in > a membership_choice_list is the tested type of the > membership expression. > > We don't need to tell the implementation how to figure out the tested > type, in the same way we don't tell them how to figure out the type of > a range. We can simply say that a membership test has a tested type. > If there is more than one type that satisfies the name resolution > rules that govern the tested type, then it's ambiguous, unless one of > the interpretations is a root numeric type, which is preferred. That would meaning requiring the ability to match an infinite number of possible types in any combination in cases with no context. Ada currently only requires three such expressions (in the X in Y .. Z case), and the expressions have to be scalar. That took 400+ lines of code in Janus/Ada alone; I don't see any sane way to generalize it. (The problem cases are things like "10 in X .. Y", where there is little useful type information from the "tested expression".) Even ignoring the implementation burden (which is likely to vary for other compilers), cases with many matching expressions is an invitation to write tricky code. It's possible to write expressions that resolve only because a single routine in a set of heavily overloaded ones is omitted. That doesn't seem necessary to make the goals of memberships possible. I would be against making resolution even more complicated just for this specific unusual case. I'd suggest making the tested expression a single type context, except that is neither compatible nor quite general enough (it would be consistent with attributes, which is the most similar other cases in the language). But something on that line (where the type is at least heavily restricted based on the tested expression) would seem to be more sensible than the current rules (and I'd suggest we consider extending that to all versions of the membership test). **************************************************************** From: Tucker Taft Date: Tuesday, May 18, 2010 9:29 PM I wonder why this is significantly harder than: if 10 = X + Y + Z + 3 + 77 + G + Q + R then ... These all have to be of the same integer type, and every one of these might be overloaded. What makes the following significantly harder? if 10 in X | Y | Z | 3 | 77 | G | Q | R then ... I realize that "=" and "+" need to be directly visible, but does that really simplify the first one that much? We can eliminate that difference with: if A and then B and then C and then D then where A, B, C, and D all need to be of the same boolean type. It seems like there are many constructs where you have more than two operands that all need to be of the same type. Perhaps the key thing is to think of "|" as a binary operator rather than as an n-ary operator in your resolution algorithm. Perhaps that would make the resolution less of a special case. Keeping the description of the resolution rules relatively simple and consistent with other rules in similar contexts, even if they imply a bit more work on the implementor's behalf, is better for the language. **************************************************************** From: Edmond Schonberg Date: Tuesday, May 18, 2010 9:40 PM I was away from the office today, and missed this discussion. As far as I can see, there are no implementation problems at all with the proposed rules, all of this falls out of the existing resolution machinery in GNAT. I agree with Tuck that the primary concern should be simplicity of description, there is nothing fundamentally new here concerning overload resolution. **************************************************************** From: Randy Brukardt Date: Tuesday, May 18, 2010 10:19 PM > I wonder why this is significantly harder than: > > if 10 = X + Y + Z + 3 + 77 + G + Q + R then ... > > These all have to be of the same integer type, and every one of these > might be overloaded. > > What makes the following significantly harder? > > if 10 in X | Y | Z | 3 | 77 | G | Q | R then ... > > I realize that "=" and "+" need to be directly visible, but does that > really simplify the first one that much? Yes. In the first case, there is the context of "any Boolean", which is then used to prune the set of possible "=" operators. Then each "=" operator is used (in turn) to resolve the parameters. In this case, the "10" doesn't cause many to fail, so there are a lot of attempts on the RHS. But it is just repeating the same thing over and over. Moreover, in practice it doesn't ever get bad (other than in ACATS-type tests). We didn't believe this in the early days of our compiler, and added various things to avoid redoing work; but eventually I found that the slowness we were trying to eliminate came almost solely from the various techniques to avoid redoing work -- less than 1% of work ever was redone, and it wasn't worth any overhead to avoid that. In the second case, we have just air to start with; there is no information at all about what to resolve to. The only way that I can see to resolve this with the current technology would be a create a list of all possible types and try each of them in turn. That would be several orders of magnitude more than any operator (even "=" rarely has more than 20-30 copies visible). And I wonder whether that could handle anonymous types, classwide types, and the like properly. > We can > eliminate that difference with: > > if A and then B and then C and then D then > > where A, B, C, and D all need to be of the same boolean type. I don't see anything eliminated here. "and then" is a always visible Boolean operator for the purposes of resolution. In fact, it is that for *all* purposes within Janus/Ada: it's just an operator. They're even declared in the symboltable (with names of "At" and "Others" so that nothing can hit them in a lookup). Since we chain all of the declarations with the same defining name, we can get the list of them trivially, and that gives us a small list of possibles to resolve to. > It seems like there are many constructs where you have more than two > operands that all need to be of the same type. > Perhaps the key thing is to think of "|" as a binary operator rather > than as an n-ary operator in your resolution algorithm. Perhaps that > would make the resolution less of a special case. The problem is not the operator, the problem is the lack of context. Even treating this as a operator, it would have to exist for all types in the closure of a program (you can use a membership on a type that you don't have visibility on), so the number is gigantic. If you're able to do that, you could just do that in the first place -- but I suspect that anonymous types and classwide types would cause problems. > Keeping the description of the resolution rules relatively simple and > consistent with other rules in similar contexts, even if they imply a > bit more work on the implementor's behalf, is better for the language. I agree. The most similar context (which is to say *no* context) to a membership is the prefix of an attribute. It's the current rules for memberships (or lack of rules) that is the anomoly. It's essentially the only context in Ada where there is no information from the context *and* there is no requirement for a "single type". That makes it very unusual. In any case, I don't think that the impossibility of implementing this in Janus/Ada is that big of a deal for the language. It's the possibility to write incredibly tricky expressions that really bothers me. Do we really want: if F1 in F2 | F3 | F4 | F5 then to resolve if F1 is defined for Integer/Float/Character/String, F2 for Integer/Float/Duration/String, F3 for Float/Character/Duration/String, F4 for Integer/Float/Duration/Character, and F5 for Integer/Float/Character/Duration?? I've already decided I hate this proposal because it makes it impossible to ever add proper set constraints to the language. (Such constraints need to be discrete and static in order to be able to affect the value set in sensible ways, there is no way to insist on such requirements after adding this syntax for memberships.) So maybe my opinion doesn't matter anymore. **************************************************************** From: Randy Brukardt Date: Tuesday, May 18, 2010 10:23 PM > I was away from the office today, and missed this discussion. > As far as I can see, there are no implementation problems at all with > the proposed rules, all of this falls out of the existing resolution > machinery in GNAT. I agree with Tuck that the primary concern should > be simplicity of description, there is nothing fundamentally new here > concerning overload resolution. I agree there is nothing new: membership is already broken, and you guys want to make it *more* broken. There are no other cases like it in Ada now, and the harm to understandability that the amazingly complex resolution brings is all out of proportion to the value of its use (which is simple subtypes and ranges made up of literals). These complex cases never happen in real use, so why burden both implementations *and* readers with the possibility?? **************************************************************** From: Tucker Taft Date: Tuesday, May 18, 2010 10:30 PM We do overload resolution bottom-up, so we never need to create a list of possible types. That may be why this kind of thing doesn't add any significant complexity to our overload resolution. **************************************************************** From: Randy Brukardt Date: Tuesday, May 18, 2010 10:37 PM > I agree there is nothing new: membership is already broken, and you > guys want to make it *more* broken. There are no other cases like it > in Ada now, and the harm to understandability that the amazingly > complex resolution brings is all out of proportion to the value of its > use (which is simple subtypes and ranges made up of literals). > These complex cases never happen in real use, so why burden both > implementations *and* readers with the possibility?? Let me be more constructive. The problem here is that the rules assume that there is a way to determine the tested type from the part of the membership following the "in", when that is not actually true. Perhaps we should fix this for this new construct by adding the name of a type: E1 in Integer when E2 | E2 Then E2 and E3 have to resolve to Integer, and finally E1 has to resolve to Integer. We no longer have to guess the type to resolve to. It would be too much to ask to require this for existing memberships (although it would be a good idea). **************************************************************** From: Edmond Schonberg Date: Wednesday, May 19, 2010 8:17 AM Wouldn't it be simpler to just require the left operand to be a complete context? simple to state and implement,. **************************************************************** From: Randy Brukardt Date: Wednesday, May 19, 2010 12:51 PM I thought about that, but if applied to memberships in general, it would prevent something like 10 in Some_Subtype which seems useful and has no resolution issues as the RHS provides a unique (sub)type. I also wonder what that would do to the new uses of memberships for accessibility checking (as anonymous access types always need to be converted to something). This latter I'd have to think about some more, by going back to that AI and looking at the examples. Doing it just for the new memberships would probably be fine, but slightly inconsistent. The existing wording for resolution of memberships implies that the RHS provides a type; if we could preserve that we wouldn't need to make any other incompatible change or add an inconsistency. **************************************************************** From: Tucker Taft Date: Wednesday, May 19, 2010 1:30 PM That's most useful with named numbers. And those resolve just fine to universal integer. For compatibility, if we have a single membership choice, then we should stick with the current rules where the type comes from the RHS (at least officially). But if there is more than one choice, we could say the type comes from the LHS, and it can't be one of those things that requires context, like a string literal or a character literal or an allocator. That doesn't seem too nasty. **************************************************************** From: Bob Duff Date: Wednesday, May 19, 2010 7:39 PM In ARG e-mails, we should use AI numbers AND titles in the subject line. Please -- I'm too old to remember what AI-158 means. ;-) Although I can guess from the discussion content... Tucker Taft wrote: > There is no requirement that we state how the types in name resolution > get determined. ... Right. We (language designers) just give a set of simultaneous equations in N variables, and let the implementers figure out how to solve them. AARM hints (as we have in the case of assignment_statement) are nice. Edmond Schonberg wrote: > I was away from the office today, and missed this discussion. I was NOT out of the office, at least not any more than usual (i.e. I am ALWAYS out of THE office), but I deliberately ignored the discussion until now. ;-) >...As far > as I can see, there are no implementation problems at all with the >proposed rules, all of this falls out of the existing resolution >machinery in GNAT. I agree with Tuck that the primary concern should >be simplicity of description, there is nothing fundamentally new here >concerning overload resolution. I agree. On the other hand, I'm not entirely opposed to Randy's view, that having type information flow right-to-left in the syntax tree is not really helpful in terms of readability. Randy Brukardt wrote: > Perhaps we should fix this for this new construct by adding the name > of a > type: > > E1 in Integer when E2 | E2 > > Then E2 and E3 have to resolve to Integer, and finally E1 has to > resolve to Integer. We no longer have to guess the type to resolve to. But this is just: Integer'(E1) in E2 | E2 Whatever the resolution rules, we already have syntax to nail down the type of E1. Randy Brukardt wrote: > In any case, I don't think that the impossibility of implementing this > in Janus/Ada is that big of a deal for the language. I think that's a key (political) point! Difficulties for Janus/Ada are interesting as a data point, but we should not outlaw new features simply because of difficulties for Janus/Ada. Sorry, Randy, but I think that makes sense at this point. On the other hand: >...It's the possibility to > write incredibly tricky expressions that really bothers me. Do we >really > want: > > if F1 in F2 | F3 | F4 | F5 then > > to resolve if F1 is defined for Integer/Float/Character/String, F2 for > Integer/Float/Duration/String, F3 for Float/Character/Duration/String, > F4 for Integer/Float/Duration/Character, and F5 for > Integer/Float/Character/Duration?? I'm generally in agreement with Randy's discomfort with this sort of thing, purely from a readability point of view (nothing to do with implementation difficulties). **************************************************************** From: Edmond Schonberg Date: Wednesday, May 19, 2010 7:53 PM > I'm generally in agreement with Randy's discomfort with this sort of > thing, purely from a readability point of view (nothing to do with > implementation difficulties). One can feel plenty of discomfort with this kind of code, but a) no one writes this sort of thing (expect eager ACATS designers :-)! b) It should just fall out of a proper resolution machinery. We have known since 1982 that the proper resolution algorithm needs two passes: one upwards to collect interpretations, and one downwards to resolve overloading. I don't recall the authors of the paper right now, but the point is that the set of simultaneous equations that Bob mentions can be shown to be solved by just these two passes, and not require some arbitrary iteration, which is rather nice. c) without some notion of set intersection I have no idea how one can resolve an assignment statement F (X) := G (X); when both sides are overloaded. There are two sets of candidate interpretations for F and G, and you need to find the (hopefully unique) common type between the two sets. Once you have that, the resolution of if F1 in F2 | F3 | F4 | F5 then proceeds by performing these intersections repeatedly to find the common interpretation, if any. In practice of course the alternatives will be constants of a some enumeration type and the heavy machinery will not be used. In any case, there are no implementation difficulties. **************************************************************** From: Randy Brukardt Date: Wednesday, May 19, 2010 8:37 PM > b) It should just fall out of a proper resolution machinery. > We have known since 1982 that the proper resolution algorithm needs > two > passes: one upwards to collect interpretations, and one downwards to > resolve overloading. I don't recall the authors of the paper right > now, but the point is that the set of simultaneous equations that Bob > mentions can be shown to be solved by just these two passes, and not > require some arbitrary iteration, which is rather nice. Whoa! "Proper resolution algorithm"?? What on earth are you talking about? I recall such a paper back in the dark ages, which was really hard to understand and also used way to much memory for our 56K CP/M systems. It simply wasn't practical. In any case, it also turns out that all of this concern about efficiency of resolution is misplaced. On real programs, it simply doesn't matter how it is done. The only problem we ever had with our recursive resolution was with an ACATS test that contained 50 concatenation operators, and the solution was to resolve the parameters in the reverse order. (You could still trigger a problem with an expression with 40+ levels of parentheses and heavily overloaded operators like "&", but I've yet to see such an expression in practice or in the ACATS.) > c) without some notion of set intersection I have no idea how one can > resolve an assignment statement F (X) := G (X); > > when both sides are overloaded. There are two sets of candidate > interpretations for F and G, and you need to find the (hopefully > unique) common type between the two sets. You need more imagination. :-) Since the LHS of an assignment is always into a Name of some sort, we start with the list of solutions for that Name (such a list has to exist, and it is set up before we start any resolution), and then try resolving the RHS for each of the types determined by the solutions (usually, there is only one, of course). There must be exactly one that works (well, it's bit more complex than that because of the preference rules for root_integer/root_real). That doesn't extend well because in other cases we cannot assume the expression is a Name. Nor do we have any way to hold onto any "set" of results; all we are keeping track of is whether exactly one solution has been found and the type of that solution. > Once you have that, the resolution of > if F1 in F2 | F3 | F4 | F5 then > > proceeds by performing these intersections repeatedly to find the > common interpretation, if any. In practice of course the alternatives > will be constants of a some enumeration type and the heavy machinery > will not be used. In any case, there are no implementation > difficulties. As I have repeatedly said, this is not true. There is never a "set" of results to intersect in any place in Janus/Ada; it's all done with code and scalar local variables. In early versions of Janus/Ada (up until 32-bit versions, not widely used until the mid 90s, long after all of the basic parts of our compiler were designed), heap space was limited to 64K and thus was at an extreme premium, thus we *always* used code rather than data structures if that was possible. Sets of arbitrary items are expensive in that context; we only built them when absolutely necessary (the set of possible solutions was one such case, and it was a major problem in terms of memory usage and heap fragmentation). There is a later compiler phase that walks the tree to remove unused interpretations and enforce legality rules; it has nothing to do with resolution. In any case, we're clearly talking past each other. I'm not going to support this with general resolution, period, both because I think the potential cost in readability is dangerous and unnecessary, and because I cannot in good conscience vote for anything that is against the interests of RRS (at least without significant user benefit, and this resolution rule has none). Further discussion of this sort is futile. **************************************************************** From: Tucker Taft Date: Tuesday, May 18, 2010 9:59 PM By the way, with all this talk about the resolution rule for membership tests, I wonder whether we have it right currently: RM 4.5.2(3/2) says: The tested type of a membership test is the type of the range or the type determined by the subtype_mark. If the tested type is tagged, then the simple_expression shall resolve to be of a type that {is convertible (see 4.6) to} [covers or is covered by] the tested type; if untagged, the expected type for the simple_expression is the tested type. The "{}" and the "[]" show the changes made to accommodate membership tests for interface types. We now seem to allow the following: type T1 is tagged ... type T2 is new T1 with ... X : T1; Y : T2; if Y in T1 then ... That was not permitted before this change. You would have had to write: if Y in T1'Class then ... or if T2'Class(Y) in T1 then ... That is, one or the other needed to be classwide, if they weren't the same type. I think we want to preserve that requirement. The "convertible" rule allows conversion between specific types, so long as the target type is an ancestor. [Aside: I think allowing conversion to an ancestor specific type when there are private extensions involved is a bug in the language, since it allows the bypassing of a private type's primitive operations. Allowing conversion to a classwide type doesn't create any such problems. For what that's worth...] **************************************************************** From: Randy Brukardt Date: Tuesday, May 18, 2010 10:29 PM Are we allowing: O1 : T'Class; O2 : T; O3 : NT; -- NT is derived from T if O1 in O2 | O3 then ?? If not, this would differ from the similar conditional expression. If so, the complexity of resolution is up another level... **************************************************************** From: Randy Brukardt Date: Tuesday, May 18, 2010 10:32 PM ... > That is, one or the other needed to be classwide, if they weren't the > same type. I think we want to preserve that requirement. Probably best done with a legality rule. Resolution should be as non-specific as possible. (I want it to be a *lot* more non-specific in this case, requiring the tested type to be determinable without any context or reference to the expression. But that's a separate discussion.) ****************************************************************