!standard 5.4(4/3) 18-05-08 AI12-0214-1/02 !class Amendment 17-01-09 !status Hold (8-0-1) - 19-10-07 !status work item 18-05-07 !status Hold by Letter Ballot failed (7-3-1) - 18-05-07 !status work item 17-01-09 !status received 16-10-08 !priority Very_Low !difficulty Hard !subject Case statements and expressions for composite types !summary ** TBD. !problem Ada has case statements and expressions, that allows testing for the value of a variable. It allows and enforce full coverage checking. On the other hand, it only works on discrete types, which limits their usefulness somehow. !proposal We allow the type of the case expression to be a composite type. The type has to have one or more visible components with at least one of the following kinds of types: * discrete types; * access types. [Editor's note: I considered allowing fixed point types as well, since it is relatively easy to define a completeness check for them. However, then the necessary omission of floating point types - which cannot have a useful completeness check - looks weird.] We would use aggregate syntax to specify the values for composite types. For instance: type R is record A, B : Boolean; end record; R_Inst : R; case R_Inst is when (True, True) => .. when (True, False) => .. when (False, False) => .. when (False, True) => .. -- Every possibility has been covered. end case; As used above, we allow the use of <> to represent all remaining values of a component (in this way, it acts like "others" for components), and others can be used in aggregate to denote of every other component, as in regular aggregates, and with the same limitations. type Arr is array (Natural range <>) of Integer; A : Arr := ...; case A is -- Match when first element is one when (1, others => <>) => ... -- Match when every element except first is one when (<>, others => 1) => ... end case; Note that <> we allows matching using null/<> for access types. This allows a case branch where it is known that an access value can be safely dereferenced: type Some_Access is access ...; type A_Record is record Cnt : Natural; Data : Some_Access; end record; Obj : A_Record; case Obj is when (1, null) => .. when (1, <>) => ... Obj.Data.all ... -- OK. when (<>, null) => .. when (<>, <>) => ... Obj.Data.all ... -- OK. end case; Option: We could allow string literals in place of an aggregate: case S is when "begin" => ...; when "end" => ...; when others => ...; end case; This would require additional syntax (a string literal is not an aggregate), and it's unclear the extra effort is worthwhile. Syntax This proposal requires introducing new branches in the case_statement_alternative and case_expression_alternative rules for lists of aggregates: aggregate_choice_list ::= aggregate_choice {| aggregate_choice} aggregate_choice => aggregate {| aggregate} case_statement_alternative ::= "when" discrete_choice_list => sequence_of_statements | "when" aggregate_choice_list "=>" sequence_of_statements case_expression_alternative ::= "when" discrete_choice_list => dependent_expression | "when" aggregate_choice_list "=>" dependent_expression Name Resolution Rules In the case of aggregate literals, the type of each subcomponent's value is the type of the expected subcomponent, as in a regular aggregate. Legality Rules Aggregate choices can only be used if the choice_expression has a composite type. A discrete_choice_list can only be used if the choice_expression has a discrete type. The expressions in an aggregate choice must be static, the null literal, or a box. An aggregate choice cannot be the same as any preceding choice. If an aggregate choice A contains a static value for a component C, no preceding aggregate choice (including in the same choice list) shall contain a box for the component C if that preceding aggregate choice covers the same values for other components (considered as a group) as A. Example: case Obj is when (1, <>) => -- OK. when (2, 2) => -- OK. when (<>, 1) => -- OK. when (1, 2) => -- Error. end case; A composite type shall have (some) visible components, and at least one of those components shall have a type that is either a discrete or access type. [We are requiring that at least one component can have a completeness check in order to use a case statement/expression. The static completeness check is the essenence of a case statement/expression; a case that cannot ever be checked that way should be written as an if statement/expression to emphasize that fact.] Case coverage for composite types is modified as follows: * <> in aggregates is treated similarly to "others". No overall coverage check is performed for such a component. * If none of the expressions for a component contain a <>, then the component subtype must be discrete and the choices must cover the values of the component subtype completely. Aggregate choices must be disjoint. The set of components not including <> must not match that of any other aggregate choice that does not include box in that same set of components. [If range choices are allowed, this gets more complicated. Probably need a proper definition of "cover" for each choice.] Dynamic semantics The alternative matching the value of the selecting_expression is executed. [We'll need to update the "covers" wording to handle these cases. - Editor.] !wording ** TBD. !discussion Coverage check From preliminary discussions, I expect one of the biggest objections to this proposal to be the lifting of the restriction of the case statement and expression to work only on values of discrete types. Lifting it is fundamental to this proposal. There was however a number of different possibilities: 1. Restrict to types that directly have a discrete number of possible values. A record containg integer sub-components would be allowed, but not a record containing a floating point sub-component. 2. Restrict to types that have at least one discrete component. Don't allow direct matching on literals for components that have types that are not discrete. That's the choice being made currently. 3. Allow everything, just require the "others" when there is no discrete number of possible values. It is felt that 2 is the most pragmatic choice. Ranges, subtypes: When matching discrete types, one can use ranges and subtypes to match a set of value of the discrete type. Question is, should we allow matching sub-components in the same way, as in: type Rec is record A, V : Integer range 1 .. 3; end record; R : Rec; case R is when (1 .. 2, 1 .. 2) => when (1 .. 3, 3) => when (3, 1 .. 3) => end case; Note that this would require moving away from aggregate syntax for choices, at least formally. We'd then have to duplicate some part of the aggregate resolution rules because we couldn't reuse them here. Probably the best plan in this case would be to abandon array aggregates here (those are very complex) and restrict this feature to record types (at least initially). Alternatives order rule We originally considered the simplest possible Legality Rule: Every use of a static expression for a specific component must precede any use of <> for that component. This disallows the following, which intuitively corresponds to sensible program: case A is when (1, 1) => ... when (1, <>) => ... when (<>, 1) => ... end case; We then considered other options. First option: - Alternatives are considered in sequential order. - If an element of an alternative is less general than the same one in a preceding alternative, then the subset of cases handled by this alternative and by none of the preceding alternatives, must be non-empty. This would disallow the following case A is when (1, <>) => ... when (1, 1) => ... -- This pattern can never be matched end case; But allow those: case A is when (1, <>) => ... when (<>, 1) => ... end case; case A is when (<>, 1) => ... when (1, <>) => ... end case; In the example above, both case statements are valid, and for the value (1, 1), a different code path will be executed, making the branches order sensitive, which goes against the current design of the case statement. This is the way pattern matching works in OCaml/Haskell/etc.. Second option: - An alternative can not appear twice. - If an element of an alternative is less general than the same one in a preceding alternative, then the subset of cases allowed by both alternatives must be covered by a preceding alternative. In this option, we're constraining the order of the alternatives for readability, eg. we force the user to go from less general to more general matches, but the order of alternatives has no direct influence on the code that will be executed in the end, making the proposal more in line with the current case statement and expression. case A is -- ILLEGAL when (1, <>) => ... when (<>, 1) => ... end case; case A is -- ILLEGAL when (<>, 1) => ... when (1, <>) => ... end case; case A is -- LEGAL when (1, 1) => ... when (<>, 1) => ... when (1, <>) => ... end case; We chose this alternative, since it gives the most capability while still requiring that alternatives are statically disjoint. !ASIS ** TBD. !ACATS test Many new ACATS tests would be needed to check that the new capabilities are supported. !appendix From: Raphael Amiard Sent: Sunday, October 9, 2016 7:41 AM Here is an AI for a feature proposal I've been drafting with some help. Of course too late to discuss at this meeting, but it'll let a lot of time for people to look at it until the next one though ! [This is version /01 of the AI - Editor.] **************************************************************** From: Tucker Taft Sent: Thursday, October 13, 2016 10:14 AM Did you consider the syntax: when (True, ) => where is declaring id to represent what "<>" would have represented on its own? I think we also talked about: when R : (True, <>) => where you now use R.blah to refer to parts matched by <> I find the "declare ... when ..." syntax too verbose, and think the "when R : ( ... ) =>" syntax the most consistent with how exception occurrences are declared now. **************************************************************** From: Raphael Amiard Sent: Thursday, October 13, 2016 10:29 AM > Did you consider the syntax: > > when (True, ) => No, we didn't think about that. On the one hand, I like it because it's very concise, coherent with the unnamed case, and quite clear about what this does ! On the other hand I'm worried that it will make the lexer's work a bit harder, since here "" can be parsed either as "Op(LT), Id(A), Op(GT)" or as "Pattern_Match_Id(A)". I'll try and implement this in libadalang's parser, to see what the repercussions are. > > where is declaring id to represent what "<>" would have > represented on its own? > > I think we also talked about: > > when R : (True, <>) => > > where you now use R.blah to refer to parts matched by <> > > I find the "declare ... when ..." syntax too verbose, and think the "when R : > ( ... ) =>" syntax the most consistent with how exception occurrences > are declared now. Yes we did. I think that being able to name the top level object is a great capability, so I'll add it to the AI. I don't however, as explained in the previous exchanges, think that it is a substitute for sub-component matching. If you want I can submit my rationale on the ARG thread. **************************************************************** From: Tucker Taft Sent: Thursday, October 13, 2016 10:40 AM ... > On the other hand I'm worried that it will make the lexer's work a bit > harder, since here "" can be parsed either as "Op(LT), Id(A), Op(GT)" > or as "Pattern_Match_Id(A)". This one is pretty easy, because you can actually lex it as LT, Id, GT. But you just have to distinguish in the parser between unary "<" and binary ">" which is pretty easy. We make that sort of distinction all the time. > I'll try and implement this in libadalang's parser, to see what the repercussions are. I would be surprised if it is difficult to handle in the parser. I don't see any need to alter the lexer for this. ... >> I find the "declare ... when ..." syntax too verbose, and think the >> "when R : ( ... ) =>" syntax the most consistent with how exception >> occurrences are declared now. > > Yes we did. I think that being able to name the top level object is a > great capability, so I'll add it to the AI. I don't however, as > explained in the previous exchanges, think that it is a substitute for > sub-component matching. If you want I can submit my rationale on the ARG thread. Yes, please do, as I don't remember why you think it is not an adequate substitute. **************************************************************** From: Raphael Amiard Sent: Thursday, October 13, 2016 10:41 AM ... >This one is pretty easy, because you can actually lex it as LT, Id, GT. But >you just have to distinguish in the parser between unary "<" and binary ">" >which is pretty easy. We make that sort of distinction all the time. Yes, you're probably right ! >Yes, please do, as I don't remember why you think it is not an adequate >substitute. Here it is, slightly edited to use the new syntax you proposed - I already love it :) Naming the object that is being matched upon, while it can be useful, is not sufficient. First it's not as expressive. You'll have to repeat the path to the sub (sub-sub) component you wanted to match, which is verbose and possibly error prone. And then, if you want to make a rule out of the fact that you can statically check that the path is correct, the implementation will be more complex, because you'll have to remember the paths and check that what the user is doing is going along those paths. Taking the realistically complex example of the connection I showed earlier: type Connection_State is (Init, Connecting, Connected, Disconnected); type Ping (Has_Ping_Info : Boolean := False) is record case Has_Ping_Info is when True => Last_Ping_Time : Time_T; Last_Ping_Id : Ping_Id; end case; end record; type Connection_Info (State : Connection_State) is record Server : Internet_Address; case State is when Connected => Session_Id : Unbounded_String; Ping_Info : Ping; when Connecting => When_Initiated : Time_T; when Disconnected => When_Disconnected : Time_T; when Init => null; end case; end record; C : Connection_Info; case C is when (Connected, , (True, <>, )) => Put_Line ("Connected ! Session Id is " & S_Id & " Ping time is " & Ping_Time'Image); when others => null; end case; Constrast with only top-level object naming: case C is when CC : (Connected, <>, (True, <>, <>)) => Put_Line ("Connected ! Session Id is " & CC.Session_Id & " Ping time is " & C.Ping_Info.Last_Ping_Time'Image); -- Woops, I used the original name rather than the matched -- name ! The compiler will silently ignore my error. when others => null; end case; Having to repeat the path is less readable and more error prone. You go through the trouble of expressing the pattern, just to have to repeat the logic underneath, effectively writing the paths twice, once in aggregate syntax, the other in prefix syntax. Statically ensuring that the accessed information is correct will be much more work for the compiler. Not to mention, that would be (yet another) feature that we don't implement like other languages. **************************************************************** From: Randy Brukardt Sent: Monday, January 9, 2017 6:45 PM (Replying to an old thread that I must have missed back in October:) ... >Constrast with only top-level object naming: > >case C is > when CC : (Connected, <>, (True, <>, <>)) => > Put_Line ("Connected ! Session Id is " > & CC.Session_Id & " Ping time is " > & C.Ping_Info.Last_Ping_Time'Image); > -- Woops, I used the original name rather than the matched > -- name ! The compiler will silently ignore my error. What error? C and CC are views of the same object, and clearly have the same value. If there is an error here, it is declaring CC in the first place (see below). > when others => null; > end case; One would want these shorthands in cases where the name of the original object is too complex. If, for instance, the original object was a function call with parameters, then the shorthand makes sense: case Get_Connection (From => Server) is ... -- Rest as above. But in this case, if you mistyped the identifier, the compiler will give you an error. So I don't see any real problem with mistakes here. Keep in mind that every identifier (and every entity for that matter) that one declares adds to the cognitive load of the reader. You really shouldn't do it unless it actually helps reading the code. (Exactly where that line is obviously is a personal choice, but it is far away from renaming a single character identifier.) >Having to repeat the path is less readable and more error prone. You go >through the trouble of expressing the pattern, just to have to repeat >the logic underneath, effectively writing the paths twice, once in >aggregate syntax, the other in prefix syntax. Arguably, that's a good thing. BTW, Ada doesn't currently allow positional <> aggregate components, and I'd suggest that we retain that rule here. (Assuming you really want to model these patterns as aggregates.) [Especially as many style guides ban positional record aggregates altogether.] Therefore, your example would have to be written something like: case Get_Connection (From => Server) is when CC : (State => Connected, Server => <>, Session_Id => <>, Ping_Info => (Has_Ping_Info => True, Last_Ping_Time => <>, Last_Ping_Id => <>)) => Put_Line ("Connected ! Session Id is " & CC.Session_Id & " Ping time is " & CC.Ping_Info.Last_Ping_Time'Image); So the names you need are already in the source. Declaring more names would just be more noise. [Aside: in writing the above, I see that your original example doesn't have enough components (the Server component seems to have been left out). Which is why many style guides require component names ... ;-)] >Statically ensuring that the accessed information is correct will be >much more work for the compiler. There seems to be no need to do that. Again, this is just a different view of an existing object, we really should not care which of those views is accessed. >Not to mention, that would be (yet another) feature that we don't >implement like other languages. You already know what I think about that: if you want to use some other language, do that. Don't try to mess up Ada with the exact features of other languages; whatever we do should fit into the Ada model and not look like it was stolen from someone else. (That's why some form of case coverage is mandatory for this feature.) Tucker's idea seems to fit with the existing syntax of the language, and seems to be sufficient for the job. **************************************************************** From: Raphael Amiard Sent: Sunday, February 12, 2017 6:45 AM Thank you for your answer Randy, and sorry I took so long to answer ! I have been swamped with other work at AdaCore, currently depiling my ARG work pile :) >> case C is >> when CC : (Connected, <>, (True, <>, <>)) => >> Put_Line ("Connected ! Session Id is " >> & CC.Session_Id & " Ping time is " >> & C.Ping_Info.Last_Ping_Time'Image); >> -- Woops, I used the original name rather than the matched >> -- name ! The compiler will silently ignore my error. > What error? C and CC are views of the same object, and clearly have > the same value. If there is an error here, it is declaring CC in the > first place (see below). We at least agree on that, in that case :) CC is useless. My example was probably not such a good one. See below. > One would want these shorthands in cases where the name of the > original object is too complex. If, for instance, the original object > was a function call with parameters, then the shorthand makes sense: > > case Get_Connection (From => Server) is > ... -- Rest as above. > > But in this case, if you mistyped the identifier, the compiler will > give you an error. So I don't see any real problem with mistakes here. This is not about mistyping, it is about accessing a component that is statically valid, but dynamically invalid due to discriminants. Let me amend a previous example: C, C2 : Connection_Info case C is when (Connected, <>, <>, (True, <>, <>)) => Print_Ping_Time (C2.Ping_Info.Ping_Time) Here you're accessing the wrong object altogether (C2). This is valid Ada, so it is pretty impossible to emit a warning, even though the code is clearly wrong. Arguably the programmer should have used more descriptive names. He should also have not made errors. The job of the compiler is to help him, and that's a great opportunity to do so. With introducing a binding, both writing the code and checking it is easier: C, C2 : Connection_Info case C is when (Connected, <>, <>, (True, <>, )) => Print_Ping_Time (Ping_Time) > Keep in mind that every identifier (and every entity for that matter) > that one declares adds to the cognitive load of the reader. You really > shouldn't do it unless it actually helps reading the code. (Exactly > where that line is obviously is a personal choice, but it is far away > from renaming a single character identifier.) Yes, I agree with that line of reasoning. The C/CC example was a straw-man, of the alternative proposal I don't like. In the example above, I feel like the "Ping_Time" binding that is introduced, both by it's strong locality and because of the static guarantees associated to it, helps the user understand the code and make sure it's correct. >> Having to repeat the path is less readable and more error prone. You go >> through the trouble of expressing the pattern, just to have to repeat the >> logic underneath, effectively writing the paths twice, once in >> aggregate syntax, the other in prefix syntax. > Arguably, that's a good thing. Let's argue then :) I see no benefit in repeating the path, only potential for errors, both for the writer and for the reader of the code. > BTW, Ada doesn't currently allow positional <> aggregate components, > and I'd suggest that we retain that rule here. (Assuming you really > want to model these patterns as aggregates.) [Especially as many style > guides ban positional record aggregates altogether.] Therefore, your > example would have to be written something like: > > case Get_Connection (From => Server) is > when CC : (State => Connected, Server => <>, > Session_Id => <>, > Ping_Info => (Has_Ping_Info => True, Last_Ping_Time => <>, > Last_Ping_Id => <>)) => > Put_Line ("Connected ! Session Id is " > & CC.Session_Id & " Ping time is " > & CC.Ping_Info.Last_Ping_Time'Image); > > So the names you need are already in the source. Declaring more names > would just be more noise. This does not make sense. The introduction of new names is used to introduce new bindings. If you have a record "Line" with two "Points" components, who themselves have X and Y components, if you want to match on the 4 subvalues the components names are not going to be enough. > [Aside: in writing the above, I see that your original example doesn't have > enough components (the Server component seems to have been left out). Which > is why many style guides require component names ... ;-)] >> Statically ensuring that the accessed information is correct will be much >> more work for the compiler. > There seems to be no need to do that. Again, this is just a different view > of an existing object, we really should not care which of those views is > accessed. The point of this feature, in my mind, is that you match the discriminants and the components at the same time. So every new binding you introduce is statically guaranteed to correspond to something in the matched value. If you combine that with: 1. A style rule (that can easily be statically checked) that it is forbidden to access variable components of a record via the regular dot notation, eg. you have to use matching. 2. A legality rule that it is forbidden to mutate the object that you're matching upon (similar to the rule about renamings of discriminated records component if I remember correctly) Then you get a style of programming where it is possible to statically guarantee that the user cannot illegally access a component of a discriminated record. This is the situation in languages such as OCaml, and it is a highly desirable one IMHO. I think it is possible to reach this goal without introducing new bindings, eg. you need to check the components paths used inside case branches, but the specification and implementation of such a feature will be harder as far as I can tell. >> Not to mention, that would be (yet another) feature that we don't implement >> like other languages. > You already know what I think about that: if you want to use some other > language, do that. Don't try to mess up Ada with the exact features of other > languages; whatever we do should fit into the Ada model and not look like it > was stolen from someone else. (That's why some form of case coverage is > mandatory for this feature.) Yes, I agree that similarity to other languages is not a strong argument. However, there often was a good reason why a feature was expressed in a certain way in another language, especially when this language is a language where the type safety was given a lot of thought, as ML and Haskell are. We should take some time to consider it. Here the reason of introducing new bindings is not (solely) expressivity, it's safety, a characteristic we care deeply about. > Tucker's idea seems to fit with the existing syntax of the language, and > seems to be sufficient for the job. I strongly disagree with that. Tucker's idea is insufficient to guarantee safety, which is one of the key points of this feature, not expressivity. I'm waiting for a counter argument :) **************************************************************** From: Randy Brukardt Sent: Monday, February 13, 2017 5:10 PM > Thank you for your answer Randy, and sorry I took so long to answer ! > I have been swamped with other work at AdaCore, currently depiling my > ARG work pile :) Real work being more important than ARG fun -- what a concept! :-) ... > This is not about mistyping, it is about accessing a component that is > statically valid, but dynamically invalid due to discriminants. Let me > amend a previous example: > > C, C2 : Connection_Info > > case C is > when (Connected, <>, <>, (True, <>, <>)) => > Print_Ping_Time (C2.Ping_Info.Ping_Time) > > Here you're accessing the wrong object altogether (C2). This is valid > Ada, so it is pretty impossible to emit a warning, even though the > code is clearly wrong. Well, it's only clearly wrong if you know the intent; I have parallel objects like this all the time (often in writing list process). Which I suppose is your point. BTW, you've again ignored the fact that <> can only appear in named notation, and I think that really does make a difference in these examples. (Not to mention that your example has seven components when written this way, not six. :-) > Arguably the > programmer should have used more descriptive names. He should also > have not made errors. The job of the compiler is to help him, and > that's a great opportunity to do so. > > With introducing a binding, both writing the code and checking it is > easier: > > C, C2 : Connection_Info > > case C is > when (Connected, <>, <>, (True, <>, )) => > Print_Ping_Time (Ping_Time) > > > > Keep in mind that every identifier (and every entity for that > > matter) that one declares adds to the cognitive load of the reader. > > You really shouldn't do it unless it actually helps reading the > > code. (Exactly where that line is obviously is a personal choice, > > but it is far away from renaming a single character identifier.) > > Yes, I agree with that line of reasoning. The C/CC example was a > straw-man, of the alternative proposal I don't like. In the example > above, I feel like the "Ping_Time" binding that is introduced, both > by it's strong locality and because of the static guarantees > associated to it, helps the user understand the code and make sure > it's correct. What guarantees? It seems to me that you need those guarantees anytime you have any sort of binding (the form doesn't matter). That is, the Tucker-style binding needs the same guarantees. > >> Having to repeat the path is less readable and more error prone. > >> You go through the trouble of expressing the pattern, just to have > >> to repeat the logic underneath, effectively writing the paths > >> twice, once in aggregate syntax, the other in prefix syntax. > > Arguably, that's a good thing. > > Let's argue then :) I see no benefit in repeating the path, only > potential for errors, both for the writer and for the reader of the > code. > > > BTW, Ada doesn't currently allow positional <> aggregate components, > > and I'd suggest that we retain that rule here. (Assuming you really > > want to model these patterns as aggregates.) [Especially as many > > style guides ban positional record aggregates altogether.] > > Therefore, your example would have to be written something like: > > > > case Get_Connection (From => Server) is > > when CC : (State => Connected, Server => <>, > > Session_Id => <>, > > Ping_Info => (Has_Ping_Info => True, Last_Ping_Time => > > <>, Last_Ping_Id => <>)) => > > Put_Line ("Connected ! Session Id is " > > & CC.Session_Id & " Ping time is " > > & CC.Ping_Info.Last_Ping_Time'Image); > > > > So the names you need are already in the source. Declaring more > > names would just be more noise. > > This does not make sense. The introduction of new names is used to > introduce new bindings. If you have a record "Line" > with two "Points" > components, who themselves have X and Y components, if you want to > match on the 4 subvalues the components names are not going to be > enough. What doesn't make sense? You already have to have the component names in the pattern, adding binding names as well is likely be confusing rather than helpful. Side-comment here: The way you have the binding defined, it doesn't seem possible to pass a larger part of the matched record to a subprogram. Consider a modification of the above: case Get_Connection (From => Server) is when CC : (State => Connected, Server => <>, Session_Id => <>, Ping_Info => (Has_Ping_Info => True, Last_Ping_Time => <>, Last_Ping_Id => <>)) => Put_Line ("Connected ! Session Id is " & CC.Session_Id & Display_Ping_Info (CC.Pinf_Info)); In this case, we're using an existing routine to generate the details about the Ping_Information. That's pretty common (after all, one of the likely reasons for having a subrecord is that it gets independently processed). I don't see any way of doing this with your binding short of going back and copying the original selecting information. ... > The point of this feature, in my mind, is that you match the > discriminants and the components at the same time. So every new > binding you introduce is statically guaranteed to correspond to > something in the matched value. > > If you combine that with: > > 1. A style rule (that can easily be statically checked) that it is > forbidden to access variable components of a record via the regular > dot notation, eg. you have to use matching. > 2. A legality rule that it is forbidden to mutate the object that > you're matching upon (similar to the rule about renamings of > discriminated records component if I remember correctly) It has to be the latter, especially in your scheme -- it is essentially a renaming of a discriminant-dependent component, so the same rules have to apply. (We adopted that rule for iterators, for instance, for similar reasons.) That means that the selecting expression would have to be "known to be constrained". And this is what I was talking about above: this is a property of *any* binding in such a matching (so long as some discriminant-dependent components are involved); it doesn't really matter about the syntax involved. If you have any non-box matching on a discriminant or discriminant-dependent component, you can't allow the item to be mutable lest the promise implicit in the declaration be violated. So the safety issue is the same either way; it doesn't depend on how the binding(s) are defined. ... > I think it is possible to reach this goal without introducing new > bindings, eg. you need to check the components paths used inside case > branches, but the specification and implementation of such a feature > will be harder as far as I can tell. It's easy, I described it above. It's all in terms of existing Ada terminology ("known to be constrained", "discriminant-dependent component", etc.). It might have to apply to multiple records (which I believe is already the case for renames), but nothing hard or weird about that. ... > > Tucker's idea seems to fit with the existing syntax of the language, > > and seems to be sufficient for the job. > > I strongly disagree with that. Tucker's idea is insufficient to > guarantee safety, which is one of the key points of this feature, not > expressivity. I'm waiting for a counter argument :) Tucker's idea combined with a "known-to-be-constrained" rule works fine to guarantee safety (as it is the same as an object rename), and indeed that seems necessary for any sort of binding. So that ends up identical either way. OTOH, your proposal doesn't seem to allow both partial matching AND direct access to the enclosing (sub)record that contains that matching. That seems to be *less* functionality and *more* complexity. Which makes it a no-brainer to me, YMMV. ;-) **************************************************************** From: Raphael Amiard Sent: Tuesday, February 14, 2017 5:10 PM >> Thank you for your answer Randy, and sorry I took so long to answer ! >> I have been swamped with other work at AdaCore, currently depiling my >> ARG work pile :) > Real work being more important than ARG fun -- what a concept! :-) It's all fun, with varying degrees of "urgent" attached :) >> case C is >> when (Connected, <>, <>, (True, <>, <>)) => >> Print_Ping_Time (C2.Ping_Info.Ping_Time) >> >> Here you're accessing the wrong object altogether (C2). This is valid >> Ada, so it is pretty impossible to emit a warning, even though the >> code is clearly wrong. > Well, it's only clearly wrong if you know the intent; I have parallel > objects like this all the time (often in writing list process). Which > I suppose is your point. It's not only clearly wrong if you know the intent: It's clearly wrong if your goal is to disallow access to a discriminant dependent component, when you don't statically know that this access is correct (which is the case above), then you should not access it, regardless of the intent. > BTW, you've again ignored the fact that <> can only appear in named > notation, and I think that really does make a difference in these examples. > (Not to mention that your example has seven components when written > this way, not six. :-) Yes, sorry about that, I did not completely ignore it, I think I altered some of them, and not all, very sloppy of me... >> Yes, I agree with that line of reasoning. The C/CC example >> was a straw-man, of the alternative proposal I don't like. In >> the example above, I feel like the "Ping_Time" binding that >> is introduced, both by it's strong locality and because of >> the static guarantees associated to it, helps the user >> understand the code and make sure it's correct. > What guarantees? The ones outlined above: You can statically check that a components exists before accessing it. > What doesn't make sense? You already have to have the component names in the > pattern, adding binding names as well is likely be confusing rather than > helpful. In that case, we're talking about a point record with no discriminant, so we don't care about safety, so it's completely a style issue, which is by essence subjective. I can't find a concrete example to discuss so let's agree this is not a case that is interesting for this discussion. > Side-comment here: The way you have the binding defined, it doesn't seem > possible to pass a larger part of the matched record to a subprogram. > Consider a modification of the above: > > case Get_Connection (From => Server) is > when CC : (State => Connected, Server => <>, > Session_Id => <>, > Ping_Info => (Has_Ping_Info => True, Last_Ping_Time => > <>, Last_Ping_Id => <>)) => > Put_Line ("Connected ! Session Id is " > & CC.Session_Id & Display_Ping_Info (CC.Pinf_Info)); > > In this case, we're using an existing routine to generate the details about > the Ping_Information. That's pretty common (after all, one of the likely > reasons for having a subrecord is that it gets independently processed). I > don't see any way of doing this with your binding short of going back and > copying the original selecting information. To be clear, I'm not arguing that top level binding is useless, in fact many languages with pattern matching do propose it. I'm arguing that it is not a substitute for sub components bindings, for the reasons outlined before. This example of yours, while arguably expressive, also shows why it would be hard to guarantee the property I have outlined above - no access to fields if you can't guarantee their legality statically. You would have to keep a shape of the whole data structure, with known and unknown discriminants, possibly across indirectly nested case statements. This is flow analysis at this stage, and probably something you don't want to make mandatory at the language level. >> If you combine that with: >> >> 1. A style rule (that can easily be statically checked) that it is >> forbidden to access variable components of a record via the >> regular dot >> notation, eg. you have to use matching. >> 2. A legality rule that it is forbidden to mutate the object >> that you're >> matching upon (similar to the rule about renamings of discriminated >> records component if I remember correctly) > It has to be the latter, especially in your scheme -- it is essentially a > renaming of a discriminant-dependent component, so the same rules have to > apply. (We adopted that rule for iterators, for instance, for similar > reasons.) That means that the selecting expression would have to be "known > to be constrained". My list is inclusive, not exclusive. Of course 2. has to be guaranteed with Tuck's proposal and with mine. However it is not the main point. The main point is 1., because this is what will allow to enforce the invariant that no component of a discriminated record is accessed if we don't know statically that it is correct. With the rule enforced, the code at the beginning: C, C2 : Connection_Info case C is when (Connected, <>, <>, (True, <>, <>)) => Print_Ping_Time (C2.Ping_Info.Ping_Time) Would be illegal because C2.Ping_Info.Ping_Time would fall under this rule. If this was really the intent of your code, you'd have to write: C, C2 : Connection_Info case C is when (Connected, <>, <>, (True, <>, <>)) => Print_Ping_Time (case C2 is when (Connected, <>, <>, <>, (True, <>, )) => PT) when others => No_Ping_Time) It is more verbose, which is in this case a good thing ! You're ensuring that the programmer handles the error case explicitly. > So the safety issue is the same either way; it doesn't depend on how the > binding(s) are defined. Only because we're not talking about the same safety issue. >> I strongly disagree with that. Tucker's idea is insufficient to >> guarantee safety, which is one of the key points of this feature, not >> expressivity. I'm waiting for a counter argument :) > Tucker's idea combined with a "known-to-be-constrained" rule works fine to > guarantee safety (as it is the same as an object rename), and indeed that > seems necessary for any sort of binding. So that ends up identical either > way. It does not, as explained above, guarantee safety of an accessed component of a discriminated record if that component depends on the discriminant. > OTOH, your proposal doesn't seem to allow both partial matching AND direct > access to the enclosing (sub)record that contains that matching. It does. You just have to put your result in a declare block. You're not usually one to argue that this added verbosity is actually a big deal I think ! declare CC : Connection_Info := Get_Connection (From => Server) begin case CC is when (State => Connected, Server => <>, Session_Id => <>, Ping_Info => (Has_Ping_Info => True, Last_Ping_Time => <>, Last_Ping_Id => <>)) => Put_Line ("Connected ! Session Id is " & CC.Session_Id & Display_Ping_Info (CC.Pinf_Info)); **************************************************************** From: Randy Brukardt Sent: Tuesday, February 14, 2016 4:16 PM > To be clear, I'm not arguing that top level binding is useless, in fact many > languages with pattern matching do propose it. I'm arguing that it is not a > substitute for sub components bindings, for the reasons outlined before. Well, you have to be careful about making a proposal too complex. I've learned through much bitter experience that if you come up with a fully worked out proposal with all of the bells and whistles, you're most likely to end up with nothing. It probably would have been better to spring this component matching proposal when the rest of this idea is nearly finished... :-) > This example of yours, while arguably expressive, also shows why it would be > hard to guarantee the property I have outlined above - no access to fields if > you can't guarantee their legality statically. You would have to keep a shape of > the whole data structure, with known and unknown discriminants, possibly across > indirectly nested case statements. This is flow analysis at this stage, and > probably something you don't want to make mandatory at the language level. Within in one of your case statements (or, for that matter, in the scope of a renames of the component), the Legality Rule already guarantees the property you want. Indeed, because of the renames solution, there is a way to already guarantee the property in Ada today (with an appropriate checking tool, of course; sounds like something AdaControl could do). That is, one could insist that all discriminant dependent components are bound with renames before use: Ping_Time : ... renames Get_Connection (From => Server).Ping_Info.Last_Ping_Id; Combined with appropriate "if"s/assertions, you can be guaranteed that the component exists and is safe. (Indeed, with a proper tool, you really shouldn't need to do anything, as a tool can relatively easily prove this property if it is provable at all.) ... > >> 1. A style rule (that can easily be statically checked) that it is > >> forbidden to access variable components of a record via the regular > >> dot notation, eg. you have to use matching. ... > The main point is 1., because this is what will allow to enforce the > invariant that no component of a discriminated record is accessed if > we don't know statically that it is correct. At least in my code, it is common to have a subprogram that works on a single variant. For instance, the routine I was working on yesterday (slightly modernized): procedure Lookup_Allocator (Expr : in Node_Ptr) with Pre => Expr.Kind = Allocator; procedure Lookup_Allocator (Expr : in Node_Ptr) is begin Lookup_Expr (Expr.Allocator_Type); -- The component is discriminant-dependent. ... end Lookup_Allocator; With your rule, you'd have to wrap this entire body in one of your case statements, and presumably have an "others" clause with an Internal_Error call. But that completely defeats the purpose of the precondition (violating our style rule: "never repeat the precondition in the body"), and would add a lot of extra verbiage to the code. So that seems like a very silly rule to have in general. I could see having it in code that is inside of a case statement, but that seems too limited to be of much use. And clearly, any barely competent tool could prove that the component use is safe (at least in the absence of some other task causing mischief). > With the rule enforced, the code at the beginning: > > C, C2 : Connection_Info > > case C is > when (Connected, <>, <>, (True, <>, <>)) => > Print_Ping_Time (C2.Ping_Info.Ping_Time) > > > Would be illegal because C2.Ping_Info.Ping_Time would fall under this > rule. > > If this was really the intent of your code, you'd have to write: > > C, C2 : Connection_Info > > case C is > when (Connected, <>, <>, (True, <>, <>)) => > Print_Ping_Time > (case C2 is > when (Connected, <>, <>, <>, (True, <>, )) => PT) > when others => No_Ping_Time) > > > It is more verbose, which is in this case a good thing ! > You're ensuring that the programmer handles the error case explicitly. Seriously, this looks like madness to me. No sane programmer is ever going to write the second just to meet some style rule. (Especially if they have to put all of the component names into the patterns!) ... > It does not, as explained above, guarantee safety of an accessed > component of a discriminated record if that component depends on the > discriminant. That's not a worthwhile goal if it requires writing gallons of unnecessary code, especially in the precondition/predicate/assertion cases. > > OTOH, your proposal doesn't seem to allow both partial matching AND direct > > access to the enclosing (sub)record that contains that matching. > > It does. You just have to put your result in a declare block. You're not > usually one to argue that this added verbosity is actually a big deal I > think ! > > declare > CC : Connection_Info := Get_Connection (From => Server) begin > case CC is > when (State => Connected, Server => <>, > Session_Id => <>, > Ping_Info => (Has_Ping_Info => True, Last_Ping_Time => > <>, Last_Ping_Id => <>)) => > Put_Line ("Connected ! Session Id is " > & CC.Session_Id & Display_Ping_Info (CC.Pinf_Info)); If that's acceptable, then you don't need any binding mechanism and the complications that it brings. Besides, if you're really willing to write a lot of code, you don't need this feature at all, so that simplifies it down to nothing -- the ultimate simple solution. ;-) **************************************************************** From: Raphael Amiard Sent: Wednesday, June 14, 2016 8:38 AM > Well, you have to be careful about making a proposal too complex. I've > learned through much bitter experience that if you come up with a > fully worked out proposal with all of the bells and whistles, you're > most likely to end up with nothing. It probably would have been better > to spring this component matching proposal when the rest of this idea > is nearly finished... :-) I think the component matching is integral to the feature actually. Something that only allows you to match literals would be crippled, both in terms of expressivity and in terms of potential safety. It would still be a big improvement on the status quo though, so I guess we can discuss this live in Vienna ! > > This example of yours, while arguably expressive, also shows why it > > would be hard to guarantee the property I have outlined above - no > > access to fields if you can't guarantee their legality statically. > > You would have to keep a shape of > > the whole data structure, with known and unknown discriminants, > > possibly across indirectly nested case statements. This is flow > > analysis at this stage, and probably something you don't want to make > > mandatory at the language level. > > Within in one of your case statements (or, for that matter, in the > scope of a renames of the component), the Legality Rule already > guarantees the property you want. Indeed, because of the renames > solution, there is a way to already guarantee the property in Ada > today (with an appropriate checking tool, of course; sounds like something > AdaControl could do). > > That is, one could insist that all discriminant dependent components > are bound with renames before use: > Ping_Time : ... renames Get_Connection (From => > Server).Ping_Info.Last_Ping_Id; Combined with appropriate > "if"s/assertions, you can be guaranteed that the component exists and > is safe. (Indeed, with a proper tool, you really shouldn't need to do > anything, as a tool can relatively easily prove this property if it is > provable at all.) I'm not sure I understand this point ! We'll have to discuss this live. > At least in my code, it is common to have a subprogram that works on a > single variant. For instance, the routine I was working on yesterday > (slightly modernized): > > procedure Lookup_Allocator (Expr : in Node_Ptr) > with Pre => Expr.Kind = Allocator; > > procedure Lookup_Allocator (Expr : in Node_Ptr) is > begin > Lookup_Expr (Expr.Allocator_Type); -- The component is discriminant-dependent. > ... > end Lookup_Allocator; > > With your rule, you'd have to wrap this entire body in one of your > case statements, and presumably have an "others" clause with an > Internal_Error call. But that completely defeats the purpose of the > precondition (violating our style rule: "never repeat the precondition > in the body"), and would add a lot of extra verbiage to the code. > > So that seems like a very silly rule to have in general. I could see > having it in code that is inside of a case statement, but that seems > too limited to be of much use. And clearly, any barely competent tool > could prove that the component use is safe (at least in the absence of > some other task causing mischief). I understand what you mean. I think it is also an issue of code style, and as with error handling in general, there is no unique good solution. However: 1. I do not think everybody would need to use that rule, or for that matter, such a simple rule. Its use is a trade-off between simplicity and safety, one that I would personally choose. 2. If you have a tool that does basic intra-procedural analysis, such as what you seem to be advocating, you could make the rule more powerful, by saying it only forces you to check the discriminant if it is not known at this point in the control flow, making the code above OK !. In that case, the fact of being able to bind sub-components in the matchers is just a very DRY convenient way to get at sub-components. > Seriously, this looks like madness to me. No sane programmer is ever > going to write the second just to meet some style rule. (Especially if > they have to put all of the component names into the patterns!) First, I don't agree that this second part should be enforced, second, your perspective on verbosity seems double-standard'ish to me: - You're fine with repeating sub-components name completely, even though it brings no benefits to the user and makes checking safety harder. - You think this is unacceptable verbosity even though it brings substantial benefits. This isn't just "some style rule", it is a (certainly restrictive) rule that allows the programmer to be sure that it will eliminate a certain class of errors completely. > ... > > It does not, as explained above, guarantee safety of an accessed > > component of a discriminated record if that component depends on the > > discriminant. > > That's not a worthwhile goal I strongly disagree with this unsubstantiated claim. I think it is a very worthwhile goal. > if it requires writing gallons of unnecessary > code, especially in the precondition/predicate/assertion cases. As explained above, we can imagine smarter rules/tools if you use a style of code where you know you pass around objects with an already known discriminant. Or you can still, not use the rule altogether. > Besides, if you're really willing to write a lot of code, you don't need this > feature at all, so that simplifies it down to nothing -- the ultimate simple > solution. ;-) I don't see how that is True, a-minima you still need some flow sensitive checking tool to guarantee access to fields that depend on a discriminant. **************************************************************** From: Randy Brukardt Sent: Wednesday, June 14, 2016 2:42 PM >> Besides, if you're really willing to write a lot of code, you don't need this >> feature at all, so that simplifies it down to nothing -- the ultimate simple >> solution. ;-) > >I don't see how that is True, a-minima you still need some flow sensitive >checking tool to guarantee access to fields that depend on a discriminant. You need that in any case, and any ASIS-based tool has enough information to make the check. So, for that matter, does any compentent Ada optimizer. (This would make a possible Code Quality Warning in Janus/Ada, see the most recent blog entry on RRSoftware.Com - http://www.rrsoftware.com/html/blog/quality.html - for the basic idea.) No extra syntax needed. **************************************************************** !topic Renaming in class membership test !reference Ada 2012 RM4.4(3/4) !from Niklas Holsti 18-01-15 !keywords membership renaming !discussion (This suggestion for some Ada extensions is taken from a discussion on comp.lang.ada, started on 2018-01-04 by Dmitry A. Kazakov within a thread with the irrelevant Subject "Re: stopping a loop iteration without exiting it".) It is sometimes necessary to supplement dynamic dispatching by manually coded case analysis, using membership tests in which the tested_simple_expression has a class-wide type and the membership_choice is a descendant class-wide subtype, for example as follows, where X is a class-wide expression: if X in T'Class then ... If the test returns True, the following actions usually need to access the tested_simple_expression (X) as an object of the membership_choice descendant type (T'Class). This leads to the following clumsy construction, which requires writing the descendant type identifier three times, and introducing a new indentation level: if X in T'Class then declare Same_X : T'Class renames T'Class (X); begin ... use Same_X as an object of T'Class end; end if; Both Dmitry and I have been bothered by this feature of class-based case analysis. It is suggested to allow a combination of the membership test and the declaration of the renaming (Same_X), as in: if X is Same_X : T'Class then ... Here Same_X is a renaming of T'Class (X). end if; This form uses a different keyword ("is", not "in") to separate it from the normal membership test. Clearly, this form of membership test cannot have more than one membership_choice (that is, it cannot be "if X is T'Class | S'Class then ...") and it cannot be a negative test (it cannot be "is not"). Earlier in the same discussion thread, a similarly extended "case" statement for a class-wide selecting_expression was suggested, as in: case X is -- or "case X'Tag" when Some_T : T'Class => ... Here Some_T is a renaming of T'Class(X). when Some_S : S => ... Here Some_S is a renaming of S(X). when others => ... end case; where the legality conditions would require that no two "when" clauses have overlapping classes (that is, both "whens" cannot be True for the same X) and that an "others" clause always be present. However, this form could be problematic in a generic context, where the non-overlapping requirement of formal generic types (T, S) might not be easy to check at compile-time. A further extension to the above would let the selecting_expression (X) be an access-to-class-wide, instead of a class-wide, with implicit dereferencing for the renamings (Some_T would be a renaming of T'Class(X.all)), and would then permit a "when null => ..." to handle the case X = null. Continuing with further variations, access types in general can sometimes lead to similar clumsy renamings, as in this example from Dmitry, where P is access X_Type: if P /= null then declare X : X_Type renames P.all; begin ... end; Here, again, an extension might allow a "case" statement with the access value P as the selecting_expression, although it can access only a single type, and a renaming combined with the "when": case P is when X : X_Type => ... Here X is a renaming of P.all. when null => ... end case; Finally, a similar extension was suggested to the normal "case" statement, with a discrete selecting_expression. Here the extension is not needed to avoid a renaming declaration, but could help readability. For an example, from Dmitry, the following code: declare Symbol : constant Character := Get_Character; begin case Symbol is when '0'..'9' => -- Process digit when 'A'..'Z' | 'a'..'z' => -- Process letter could be replaced by this, somewhat simpler code: case Get_Character is when Digit : '0'..'9' => -- Process the digit Digit. when Letter : 'A'..'Z' | 'a'..'z' => -- Process the letter Letter. As observed in the comp.lang.ada thread, these suggestions have the common flavour of introducing a kind of "pattern matching" syntax into Ada control flow, but a very simple one (the pattern defines a single new name). **************************************************************** From: Randy Brukardt Sent: Saturday, January 20, 2018 8:15 PM > As observed in the comp.lang.ada thread, these suggestions have the > common flavour of introducing a kind of "pattern matching" syntax into > Ada control flow, but a very simple one (the pattern defines a single > new name). There already is such a "pattern matching" proposal in the hopper; the current plan is to split it from AI12-0214-1 where it currently lives. I first became concerned about the number of gee-gaws proposed for Ada 2020 because of this pattern matching proposal, so that should make it fairly obvious where I stand on this one. ;-) ... > If the test returns True, the following actions usually need to access > the tested_simple_expression (X) as an object of the membership_choice > descendant type (T'Class). This leads to the following clumsy > construction, which requires writing the descendant type identifier > three times, and introducing a new indentation level: > > if X in T'Class then > declare > Same_X : T'Class renames T'Class (X); > begin > ... use Same_X as an object of T'Class > end; > end if; I've written a lot of such code (especially in the Claw Builder), and I never once used this construction. In the Claw Builder, X typically is a dereference of an access to a Root_Window, and the test is needed to call some operation on some operation only defined for a child hierarchy (for instance, for controls). In such cases, the dependent code is almost always a single (dispatching?) call, and since there is only one use of the name, it is better to just use the type conversion directly on the call parameter, rather than to introduce an extra name. Even if there are several uses, it is often the case that the conversions are hardly any longer than the renaming, so the simpler code is preferred. In general, I think it is a bad idea to rename objects, as that requires introducing an additional identifier to the program, increasing the number that a reader must understand. Renaming is mainly a construction that helps the writer, not the reader. I think it is best reserved for the rare cases where the entity has to be evaluated once rather than multiple times. (It also can make the code slower, by creating an intermediate storage location with associated memory write(s), rather than just evaluating into registers). Aliasing (having multiple names for the same thing) makes code harder to understand both for the compiler and for human readers. There's a reason that it is better to pass in all of the objects needed for a given subprogram even if they are visible elsewhere (so the reader can consider the subprogram as a single unit without considering any possible aliasing). ... > Continuing with further variations, access types in general can > sometimes lead to similar clumsy renamings, as in this example from > Dmitry, where P is access X_Type: > > if P /= null then > declare > X : X_Type renames P.all; > begin > ... > end; This is even worse. You've introduced an entire block and an extra name to save 4 characters on each use! Any reasonable compiler will eliminate any redundant checks, so this construction buys nothing except a bunch of extra lines in the code (and an obvious reduction in readability). Now, I realize I am a charter member in the "write it all out explicitly" club (I doubt that few other Ada programmers will write out "Ada.Strings.Unbounded.To_Unbounded_String" as often as I do), but Ada code is (or should be) primarily about making the result easy to read and understand. Reasonable people can disagree about about how long is too long, but the only time constructions like the above make sense is when the new name is substantially shorter (remaining understandable) than the original name. And in such cases, the length of the construction isn't particularly relevant (since it is a lot less than the names in question). Shorthands here just make it easier for the writer rather than helping the reader. ***************************************************************