!standard A.4.3(16) 10-06-07 AI05-0031-1/04 !standard A.4.3(67) !standard A.4.3(68/1) !standard A.4.4(51) !standard A.4.5(46) !class Amendment 06-11-03 !status Amendment 2012 10-04-05 !status ARG Approved 7-0-2 10-02-26 !status work item 06-11-03 !status received 06-11-03 !priority Low !difficulty Easy !subject Add a From parameter to Find_Token !summary Add a version of Find_Token with a From parameter to all three predefined string packages. !problem Find_Token in Ada.Strings.Fixed, Ada.Strings.Unbounded and Ada.Strings.Bounded has no version with a From index. A From index (the index where to start looking) has been added into the Index routines in those same packages. It is important to be able to start in the middle of a long string when iterating to find multiple tokens. !proposal Add a version of Find_Token with a From parameter to all of the predefined string packages. !wording Add before A.4.3(16): procedure Find_Token (Source : in String; Set : in Maps.Character_Set; From : in Positive; Test : in Membership; First : out Positive; Last : out Natural); [Editor's Note: The From parameter in the third kind of Index is placed before the Test : in Membership parameter. I'm not sure why, but I put it in the same place here, to be consistent. The worst thing would be to have it in all different places.] Add before A.4.3(67): procedure Find_Token (Source : in String; Set : in Maps.Character_Set; From : in Positive; Test : in Membership; First : out Positive; Last : out Natural); If Source is not the null string and From is not in Source'Range, then Index_Error is raised. Otherwise, First is set to the index of the first character in Source(From..Source'Last) that satisfies the Test condition. Last is set to the largest index such that all characters in Source(First..Last) satisfy the Test condition. If no characters in Source(From..Source'Last) satisfy the Test condition, First is set to From, and Last is set to 0. Replace A.4.3(68/1) by: Equivalent to Find_Token (Source, Set, Source'First, Test, First, Last). AARM Ramification: If Source'First is not in Positive, which can only happen for an empty string, this will raise Constraint_Error. Add before A.4.4(51): procedure Find_Token (Source : in Bounded_String; Set : in Maps.Character_Set; From : in Positive; Test : in Membership; First : out Positive; Last : out Natural); Add before A.4.5(46): procedure Find_Token (Source : in Unbounded_String; Set : in Maps.Character_Set; From : in Positive; Test : in Membership; First : out Positive; Last : out Natural); !discussion This is a consistency change; all of the searching routines in the Strings packages should have similar capabilities. Note we add the new routines before the old ones to be consistent with how that was done for function Index. We reworded the definition of Find_Token to be simpler and to make it clear that the longest possible slice starting at From is returned. That is not the same as ensuring that the character immediately before the returned slice does not satisfy the test condition (which was the criterion of the original wording). To see the difference, consider the following: Source(1..7) := "1234567"; Set := "345"; Find_Token (Source, Set, From => 3, Test => Inside, First => First, Last => Last); -- After this call, First = 3, Last = 5. Find_Token (Source, Set, From => 4, Test => Inside, First => First, Last => Last); -- After this call, First = 4, Last = 5. In the latter result, the character at (3) in the source string does meet the condition, but that fact is ignored because From is greater than 3. This interpretation makes Find_Token (Source(From..Source'Last), Set, Test, First, Last) give the same results as Find_Token (Source, Set, From, Test, First, Last); which seems to be the most natural interpretation. !example !corrigendum A.4.3(16) @dinsb @xcode<@b Find_Token (Source : @b String; Set : @b Maps.Character_Set; Test : @b Membership; First : @b Positive; Last : @b Natural);> @dinst @xcode<@b Find_Token (Source : @b String; Set : @b Maps.Character_Set; From : @b Positive; Test : @b Membership; First : @b Positive; Last : @b Natural);> !corrigendum A.4.3(67) @dinsb @xcode<@b Find_Token (Source : @b String; Set : @b Maps.Character_Set; Test : @b Membership; First : @b Positive; Last : @b Natural);> @dinss @xcode<@b Find_Token (Source : @b String; Set : @b Maps.Character_Set; From : @b Positive; Test : @b Membership; First : @b Positive; Last : @b Natural);> @xindent !corrigendum A.4.3(68) @drepl Find_Token returns in First and Last the indices of the beginning and end of the first slice of Source all of whose elements satisfy the Test condition, and such that the elements (if any) immediately before and after the slice do not satisfy the Test condition. If no such slice exists, then the value returned for Last is zero, and the value returned for First is Source'First; however, if Source'First is not in Positive then Constraint_Error is raised. @dby Equivalent to Find_Token (Source, Set, Source'First, Test, First, Last). !corrigendum A.4.4(51) @dinsb @xcode<@b Find_Token (Source : @b Bounded_String; Set : @b Maps.Character_Set; Test : @b Membership; First : @b Positive; Last : @b Natural);> @dinst @xcode<@b Find_Token (Source : @b Bounded_String; Set : @b Maps.Character_Set; From : @b Positive; Test : @b Membership; First : @b Positive; Last : @b Natural);> !corrigendum A.4.5(46) @dinsb @xcode<@b Find_Token (Source : @b Unbounded_String; Set : @b Maps.Character_Set; Test : @b Membership; First : @b Positive; Last : @b Natural);> @dinst @xcode<@b Find_Token (Source : @b Unbounded_String; Set : @b Maps.Character_Set; From : @b Positive; Test : @b Membership; First : @b Positive; Last : @b Natural);> !ACATS test Add ACATS C-Tests for the new functions (for all of Ada.Strings.Fixed, Ada.Strings.Bounded, and Ada.Strings.Unbounded). !appendix From: Pascal Obry Sent: Monday, October 30, 2006 2:21 PM I just noticed that the Find_Token in (Fixed, Unbounded and Bounded) has no version with a From index. This is especially important when iterating over a long string to find multiple token. Such From index (index where to start looking) has been added into the Index routines. Why not for Find_Token ? The only version is: procedure Find_Token (Source : in Unbounded_String; Set : in Maps.Character_Set; Test : in Membership; First : out Positive; Last : out Natural); I would like to propose this : procedure Find_Token (Source : in Unbounded_String; Set : in Maps.Character_Set; Test : in Membership; From : in Positive; First : out Positive; Last : out Natural); From being here the starting position to look for the given token. An alternate solution could be to use First: procedure Find_Token (Source : in Unbounded_String; Set : in Maps.Character_Set; Test : in Membership; First : in out Positive; Last : out Natural); In this case the First parameter is changed to mode "in out", the initial value being the starting position to look for the given token. This last solution looks better to me. Thoughts ? **************************************************************** From: Adam Beneschan Sent: Friday, November 3, 2006 1:07 PM > I just noticed that the Find_Token in (Fixed, Unbounded and Bounded) has > no version with a From index. This is especially important when > iterating over a long string to find multiple token. Such From index > (index where to start looking) has been added into the Index routines. > Why not for Find_Token ? I just checked and Find_Token is not mentioned at all in AI-301 (including all of the e-mail). Looks to me like nobody else noticed it. I think you're right, this is an omission. > The only version is: > > procedure Find_Token (Source : in Unbounded_String; > Set : in Maps.Character_Set; > Test : in Membership; > First : out Positive; > Last : out Natural); > > I would like to propose this : > > procedure Find_Token (Source : in Unbounded_String; > Set : in Maps.Character_Set; > Test : in Membership; > From : in Positive; > First : out Positive; > Last : out Natural); > > >From being here the starting position to look for the given token. An > alternate solution could be to use First: > > procedure Find_Token (Source : in Unbounded_String; > Set : in Maps.Character_Set; > Test : in Membership; > First : in out Positive; > Last : out Natural); > > > In this case the First parameter is changed to mode "in out", the > initial value being the starting position to look for the given token. > This last solution looks better to me. > > Thoughts ? I definitely like the first solution (separate From and First parameters) better. If the second solution were adopted, I think a call to it would look confusing, since the parameter would have to be a variable used for one meaning before the call and a different (although vaguely similar) meaning after the call. Anyway, I've seen code that calls routines like that and I always end up scratching my head trying to figure out what the heck is going on. **************************************************************** From: Randy Brukardt Sent: Friday, November 3, 2006 11:28 PM > I just checked and Find_Token is not mentioned at all in AI-301 > (including all of the e-mail). Looks to me like nobody else noticed > it. I'm not sure that anyone knows that Find_Token exists or what it does. So it's not surprising that it didn't immediately come to mind. Anyway, I think you could make the argument that the "From" parameter is useful for pretty much all of the Unbounded string routines, but it is really easy for that to turn into feeping creaturism. (It's hard to find much use for most of the Unbounded string routines anyway.) So where do you draw the line? I suspect that adding much more to AI-301 would have killed it (it was a tough sell originally), so I think it was best that Find_Token was left out. That doesn't mean that we shouldn't think about adding it in the future. **************************************************************** From: Pascal Obry Sent: Monday, November 6, 2006 12:43 AM > I'm not sure that anyone knows that Find_Token exists or what it does. So > it's not surprising that it didn't immediately come to mind. Anyway, I think > you could make the argument that the "From" parameter is useful for pretty > much all of the Unbounded string routines, but it is really easy for that to Why all? Apart Index and Find_Token that can be use repeatedly to look for some patterns in a string I don't see the need for others. > turn into feeping creaturism. (It's hard to find much use for most of the > Unbounded string routines anyway.) So where do you draw the line? Hard to find much use ? Ok, I must be different then :) Frankly this is quite a nice addition to Ada95, and there is services in Ada.Strings.Unbounded that I used all the time! I definitely think that improving it is very important, hence my Find_Token proposal. Better the interface will be more it will be used! The solution to my problem today is to convert the unbounded_string to a string and to take successive slice to pass to Find_Token. This is not acceptable for a language like Ada! **************************************************************** From: Jeffrey Carter Sent: Monday, November 6, 2006 2:22 PM > The solution to my problem today is to convert the unbounded_string to a > string and to take successive slice to pass to Find_Token. This is not > acceptable for a language like Ada! Why not use Ada.Strings.Unbounded.Slice? **************************************************************** From: Pascal Obry Sent: Monday, November 6, 2006 2:36 PM Performance? **************************************************************** From: Jeffrey Carter Sent: Monday, November 6, 2006 8:18 PM Then you probably shouldn't be using Ada.Strings.Unbounded. **************************************************************** From: Pascal Obry Sent: Tuesday, November 7, 2006 1:33 AM That's not because unbounded strings are slower than standard strings that I must be ok to use an even worst implementation of Find_Token. Dealing with unbounded strings directly is ok, the conversion from/back to string hit performance. I want to avoid that. Note also that with a good cache, the unbounded strings are not that slow. See the GNAT implementation for example. And we are speaking of a very simple addition, looks worth it to me. **************************************************************** From: Randy Brukardt Sent: Tuesday, November 7, 2006 6:25 PM > That's not because unbounded strings are slower than standard strings > that I must be ok to use an even worst implementation of Find_Token. > Dealing with unbounded strings directly is ok, the conversion from/back > to string hit performance. I want to avoid that. Then Jeff is right. To use the unbounded strings package requires lots of conversions back and forth, simply because most of the operations in the unbounded strings package take String, not Unbounded_String, arguments. For instance, my spam filter does a lot of searching for patterns (stored as lists of unbounded strings) in messages (stored as lists of unbounded strings). The patterns have to be converted to strings on every use - ugh. [Yes, I could have stored the patterns as regular strings, but then I'd have to do a lot of memory management on the lists of patterns. And if I did that, I would necessarily convert the messages (since they're stored in the same type) to regular strings as well -- and I wouldn't use unbounded strings at all.] So if you need maximum performance, you can't use unbounded strings. If the memory management aspects are more important to you than performance, then the extra conversions cannot be a big deal. You can't have it both ways (mainly because Ada doesn't have a way to give string literals to private types -- but even if it did, you'd need a package quite different than unbounded strings). > Note also that with a good cache, the unbounded strings are not that > slow. See the GNAT implementation for example. > > And we are speaking of a very simple addition, looks worth it to me. But remember that any change to the standard packages is (potentially) incompatible. We need a strong justification to introduce incompatibilies. We took a somewhat weaker hurdle for incompatibilities in the Amendment, because it represented a major update and we expected users to be unsurprised about minor glitches from rare incompatibilies. Note that we do *not* allow these new routines in Ada 95 implementations. That's specifically because of the compatibility concerns -- we do not want programs that work on one Ada 95 compiler to fail on another because of the presence or absence of these new routines. But the Amendment is done now, and it is in use (at least with GNAT). Changes now have a higher burden. Of course, if there is an actual bug (wrong mode, wrong type, etc.), that should be fixed, but we're not in the business of making changes that might break real, existing programs simply because it seems inconsistent and it is a "simple addition". If this had comment had been made a year ago while the Amendment was still being finalized, the change might very well have been made. But the Amendment is frozen (and mostly approved) and in use. In my opinion, nice-to-haves have to wait for the next revision/Amendment. Whenever that is. **************************************************************** From: Jeffrey Carter Sent: Tuesday, November 7, 2006 8:11 PM > Then Jeff is right. To use the unbounded strings package requires lots of > conversions back and forth, simply because most of the operations in the > unbounded strings package take String, not Unbounded_String, arguments. For > instance, my spam filter does a lot of searching for patterns (stored as > lists of unbounded strings) in messages (stored as lists of unbounded > strings). The patterns have to be converted to strings on every use - ugh. > [Yes, I could have stored the patterns as regular strings, but then I'd have > to do a lot of memory management on the lists of patterns. And if I did > that, I would necessarily convert the messages (since they're stored in the > same type) to regular strings as well -- and I wouldn't use unbounded > strings at all.] What he said. > If this had comment had been made a year ago while the Amendment was still > being finalized, the change might very well have been made. But the > Amendment is frozen (and mostly approved) and in use. In my opinion, > nice-to-haves have to wait for the next revision/Amendment. Whenever that > is. My guess is 2019. **************************************************************** From: John Barnes Sent: Wednesday, November 8, 2006 1:27 AM > So if you need maximum performance, you can't use unbounded strings. If the > memory management aspects are more important to you than performance, then > the extra conversions cannot be a big deal. You can't have it both ways > (mainly because Ada doesn't have a way to give string literals to private > types -- but even if it did, you'd need a package quite different than > unbounded strings). One of the features that Tuck proposed when doing Ada 9x was to allow the definition of literals for private types. I thought it was a wonderful idea and still miss it. But it was killed at an early stage. A thought for Ada 2016? **************************************************************** From: Christoph Grein Sent: Wednesday, November 8, 2006 2:07 AM Why not, but how would those literals be different from enums? We already have a kind of such "literals" as parameterless functions returning objects of the private type. How could we define "string literals" (or aggregates) for private types? What kind of literals are envisaged after all? **************************************************************** From: Robert A. Duff Sent: Wednesday, November 8, 2006 3:12 PM > How could we define "string literals" (or aggregates) for private types? > > What kind of literals are envisaged after all? The idea is that the programmer provides a function that converts from the source representation to the type, and this function is implicitly called when a literal appears in the source code. Perhaps: function My_Literal_Function (X : String) return My_Time_Type; for My_Time_Type'Literal use My_Literal_Function; Then: X : My_Time_Type := "June 1, 2006, at 10 o'clock"; would be equivalent to: X : My_Time_Type := My_Literal_Function("June 1, 2006, at 10 o'clock"); Or: function Lit (X : String) return Bignum; for Bignum'Literal use Lit; X : Bignum := (2 ** 100) - 1_000_000_000_000_000_000_000_000_000_000; One could do similar things for record aggregates and extension aggregates. Array aggregates are tricky. The overload resolution rules would have to be changed incompatibly. Currently, in P(123), the 123 can be used to choose a P that takes Integer over some non-integer type. That call would have to be ambiguous. **************************************************************** From: Alexander E. Kopilovich Sent: Wednesday, November 8, 2006 9:27 PM > How could we define "string literals" (or aggregates) for private types? > > What kind of literals are envisaged after all? and Robert A. Duff replies: >The idea is that the programmer provides a function that converts from the >source representation to the type, and this function is implicitly called when >a literal appears in the source code. Perhaps: > > function My_Literal_Function (X : String) return My_Time_Type; > for My_Time_Type'Literal use My_Literal_Function; Yes, something of this kind I proposed here 3 years ago (and that proposition received the honorary status "no action" on 03-12-05): http://www.ada-auth.org/cgi-bin/cvsweb.cgi/ACs/AC-00090.TXT?rev=1.2 **************************************************************** From: Pascal Obry Sent: Wednesday, November 8, 2006 3:37 AM Randy Brukardt a écrit : > So if you need maximum performance, you can't use unbounded strings. If the > memory management aspects are more important to you than performance, then > the extra conversions cannot be a big deal. You can't have it both ways > (mainly because Ada doesn't have a way to give string literals to private > types -- but even if it did, you'd need a package quite different than > unbounded strings). Looks like I'm not making myself clear. First of all I'm not searching maximum performance. I'm just trying to avoid maximum performance degradation. That's quite different to me. Secondly, I'd like also to point out that if the unbounded_string is huge, converting to string might not be an option. Last, I'm not pushing to have this in Ada 2005. I raised an issue and everybody seems to be working hard to find arguments to dismiss it. Just to be clear, I'm perfectly fine to have this issue dropped right now or scheduled for the next amendment. > But remember that any change to the standard packages is (potentially) > incompatible. We need a strong justification to introduce incompatibilies. > We took a somewhat weaker hurdle for incompatibilities in the Amendment, > because it represented a major update and we expected users to be > unsurprised about minor glitches from rare incompatibilies. I understand, in the current case I don't see what kind of incompatibilities could be introduced. **************************************************************** From: Randy Brukardt Sent: Wednesday, November 8, 2006 5:35 PM ... > Last, I'm not pushing to have this in Ada 2005. I raised an issue and > everybody seems to be working hard to find arguments to dismiss it. Just > to be clear, I'm perfectly fine to have this issue dropped right now or > scheduled for the next amendment. Oh, OK. I naturally assumed that you were looking for a change sooner than 10 years from now, as we're not intentionally looking for new Amendment ideas now. (Of course, they sometimes come up organically, as in the other thread that's going on now. They'll get filed somewhere for future reference.) > > But remember that any change to the standard packages is (potentially) > > incompatible. We need a strong justification to introduce incompatibilies. > > We took a somewhat weaker hurdle for incompatibilities in the Amendment, > > because it represented a major update and we expected users to be > > unsurprised about minor glitches from rare incompatibilies. > > I understand, in the current case I don't see what kind of > incompatibilities could be introduced. Pretty much any change to a predefined package can cause problems if the package is USEd. And it's pretty common to reference the predefined packages with a use clause. The problem occurs if there is a user-defined routine with the same name in some package that is used as well. In that case, adding a new routine can make existing calls ambiguous. Worse, child packages of Unbounded can have the behavior of their calls changed silently (the new routine, rather than the user-defined one, would be called, as the new one would be directly visible and that has priority over any use-visibility). Obviously, it's not particularly likely for there to be something called Find_Token in user code; but my experience is that the names of predefined routines often get "borrowed" for other purposes (they tend to be good, simple names, and programmers are familar with them). And, as I said before, it's not clear that we're willing to have any unnecessary incompatibilities when we're purely in bug-fixing mode (as opposed to Amendment mode). **************************************************************** From: Robert A. Duff Sent: Wednesday, November 8, 2006 2:58 PM > I understand, in the current case I don't see what kind of > incompatibilities could be introduced. Whenever a new subprogram is added to a package, it causes an incompatibility. In particular, if another subprogram with the same name and profile exists in some user's package, and both packages have use_clauses, then calls to the user's subprogram become illegal, due to the name conflict. But it's hardly a reason to say "never add a subprogram to a predefined package"! **************************************************************** From: Robert A. Duff Sent: Wednesday, November 8, 2006 7:18 PM > Obviously, it's not particularly likely for there to be something called > Find_Token in user code; ... Actually, that's not so obvious. Pascal wants Find_Token-with-From. If he doesn't get it from the ARG, I'd say it's quite likely that he will declare it in his own package! So if ARG adds it, it _will_ conflict. Whether he will consider that a bug or a feature is an interesting question. ;-) >...but my experience is that the names of predefined > routines often get "borrowed" for other purposes... Well, OK, but if it's "for other purposes", it has a different profile, and therefore won't conflict. (Presuming it's overloadable, as is the case for subprograms.) >... (they tend to be good, > simple names, and programmers are familar with them). And, as I said before, > it's not clear that we're willing to have any unnecessary incompatibilities > when we're purely in bug-fixing mode (as opposed to Amendment mode). It's a judgement call. I have no strong opinion one way or 'tother here. I don't think the possibility of name conflicts should absolutely rule out additions to predefined packages. **************************************************************** From: Randy Brukardt Sent: Wednesday, November 8, 2006 7:38 PM > It's a judgement call. I have no strong opinion one way or 'tother here. > I don't think the possibility of name conflicts should absolutely rule > out additions to predefined packages. Well, I'd agree personally, but the ARG has come down on the side of compatibility in Ada 95 vs. Ada 2007 changes. As I'm sure you know, GNAT has pragmas and switches to ensure that the new subprograms are not used by Ada 95 programs -- and that was discussed and required by the ARG. I don't see how this case (or any other case not involving a clear bug) differs from that decision - Ada 2007 (or Ada 2005 if you prefer) is frozen now and I don't think we should be making random incompatible changes other than to fix bugs. **************************************************************** From: Pascal Leroy Sent: Thursday, November 9, 2006 2:07 AM > It's a judgement call. I have no strong opinion one way or > 'tother here. I don't think the possibility of name conflicts > should absolutely rule out additions to predefined packages. I am not too concerned about name conflicts (I believe that they are extremely improbable) but I am concerned about portability. If we add new subprograms now, it is not clear if/when they will be incorporated in compilers. So programs that use the new and improved Find_Token may not port. Not a good thing. On the other hand, there aren't many compiler technologies left... **************************************************************** From: Pascal Obry Sent: Thursday, November 9, 2006 2:56 PM On the other hand we are talking about a trivial implementation, 5 minutes for the implementation, 15 minutes to add a non regression test! So I don't see a portability problem here, at least vendors won't have hard work supporting this. **************************************************************** From: Randy Brukardt Sent: Thursday, November 9, 2006 6:54 PM I'd argue with your numbers (they're several orders of magnitude low), but they're irrelevant in any case (as we've discussed in the ARG several times). Vendors don't release new versions of compilers for every 10 minute change that comes from the ARG! Depending on the vendor, compiler releases require a lot of QA testing, documentation work, and the like. Often, release cycles are over a year or more long. Moreover, some vendors (and most users) only use completed ISO standards for their work (ignoring ARG rulings in between). Even if this change was adopted at the upcoming ARG meeting, it would not appear in a published standard for several more years. So (if adopted now) there would be a period (probably a long period) where some implementations implemented the change and some did not. This would cause a portability issue, as Pascal Leroy pointed out. Moreover, it would mean that cautious users could not use the new routine (and most likely, many of them would not even know it exists, since it would not appear in the Standard). This is precisely the situation that the ARG voted to not allow to happen with Ada 95 compilers vis-a-vis the new Index functions. I don't see why Ada 2005 compilers should be any different. (Indeed, I would be very upset if we were to go ahead with this subprogram, but continue to not allow a similar incompatibility in Ada 95 compilers. The effect of that is to require a significant amount of work to allow routines in the runtime to be accessed or invisible depending on a compiler switch -- a *lot* more work than "5 minutes for the implementation".) **************************************************************** From: Dan Eilers Sent: Thursday, November 9, 2006 7:20 PM > So (if adopted now) there would be a period (probably a long period) where > some implementations implemented the change and some did not. This would > cause a portability issue, as Pascal Leroy pointed out. ... This portability concern would seem to apply to just about any non-editorial AI ever considered by the ARG. Are you suggesting that the ARG should stop considering non-editorial AI's just because implementers may implement them at different times? or is this particular issue somehow special? **************************************************************** From: Randy Brukardt Sent: Thursday, November 9, 2006 7:46 PM No, of course not. Certainly, the concern doesn't apply to Amendment-class AIs (because they won't be implemented now, and when they are implemented it will be as part of a new version of the language). It does apply to all other AIs. But, most AIs are upwards compatible (while additions/changes to the standard library are not). For instance, adding the missing wording that Adam pointed out is a compatible change (it's unlikely that anyone would have intentionally implemented anything other than the rules for instantiation, especially as the rules were correct in Ada 95). Those that aren't fix significant bugs in the Standard or omissions where it is not clear what an implementer should do. (In the later case, the AI actually increases compatibility in the long run.) If there are AIs that don't fit in any of these categories, and they cause incompatibilities, then they probably should not be adopted (or should be reclassified as Amendment AIs). **************************************************************** From: Pascal Leroy Sent: Friday, November 10, 2006 1:42 AM > Vendors don't release > new versions of compilers for every 10 minute change that > comes from the ARG! Depending on the vendor, compiler > releases require a lot of QA testing, documentation work, and > the like. Often, release cycles are over a year or more long. Not to mention that, once a release is out, users don't rush to adopt it. We still have users happily using a version that we released in 2000, and they won't move to more recent stuff for fear of destabilizing their environment. These big projects have a huge inertia. **************************************************************** From: Randy Brukardt Sent: Thursday, February 11, 2010 11:18 PM One of my action items was to create wording for the additional Find_Token routine that we think should be added for consistency. Here's what I came up with: procedure Find_Token (Source : in String; Set : in Maps.Character_Set; From : in Positive; Test : in Membership; First : out Positive; Last : out Natural); Find_Token returns in First and Last the indices of the beginning and end of the first slice of Source where First >= From, all of the elements of the slice satisfy the Test condition, and such that the elements (if any) immediately before and after the slice do not satisfy the Test condition. If no such slice exists, then the value returned for Last is zero, and the value returned for First is From; however, if From is not in Positive then Constraint_Error is raised. Unfortunately, it's not clear this is what is intended. What is supposed to happen if From is in the middle of a token? Consider: Source(1..7) := " 345 "; Set := " "; Find_Token (Source, Set, From => 3, Test => Outside, First => First, Last => Last); -- After this call, First = 3, Last = 5. Find_Token (Source, Set, From => 4, Test => Outside, First => First, Last => Last); -- After this call, First = 4, Last = 0. The latter result requires some explanation. The wording requires three things about a slice: First >= From, all the elements in the slice satisfy the test, *and* the elements immediately before and after the slice do not satisfy the test condition. The slice 3 .. 5 fails the first test, and the slice 4 .. 5 fails that last test (because the element at From-1 *does* satisfy the test). I considered an alternative wording where the string would act like was truncated at From. That however has the effect of possibly splitting tokens, which seems weird. But the defined semantics is weird, too. This seems like a possible reason why we didn't add From to this routine in the first place. I have no idea which of these semantics is right. Index doesn't care what precedes From, so it is no help. Returning "45" from the string " 345 " seems wrong. OTOH, the similar call: Find_Token (Source(4..7), Set, Test => Outside, First => First, Last => Last); does return First = 4, Last = 5. Thoughts? And if you want the alternative wording, how would you word it? I had enough trouble getting the above wording to make sense. **************************************************************** From: Bob Duff Sent: Friday, February 12, 2010 8:41 AM > Find_Token returns in First and Last the indices of the beginning and > end of the first slice of Source where First >= From, all of the > elements of the slice satisfy the Test condition, and such that the > elements (if any) immediately before and after the slice do not > satisfy the Test condition. If no such slice exists, then the value > returned for Last is zero, and the value returned for First is From; > however, if From is not in Positive then Constraint_Error is raised. The existing wording "immediately before and after" is kind of bogus, IMHO. "Immediately before" is implied by "first slice". And the point of "after" is to require it to be the longest such slice. I would have worded it using "longest such slice" or something like that, but I don't suggest we change it now. The "if From is not in Positive" part goes without saying, because the subtype of From is Positive. Can't we just say, "equivalent to Find_Token(Source(From..Source'Last), Set, Test, First, Last)"? Or, "does the same thing as the previous Find_Token procedure, passing Source(From..Source'Last) as the Source parameter". > Unfortunately, it's not clear this is what is intended. Indeed, it's clear that it's not what is intended. ;-) >...What is supposed to > happen if From is in the middle of a token? It usually won't be, because the intended use is to loop repeatedly finding tokens. But if it is, ignore that fact -- you don't want it to look at any character before From. You told it to start searching at From, so that's what it should do. > Source(1..7) := " 345 "; > Set := " "; > Find_Token (Source, Set, From => 3, Test => Outside, First => First, > Last => Last); > -- After this call, First = 3, Last = 5. Right. > Find_Token (Source, Set, From => 4, Test => Outside, First => First, > Last => Last); > -- After this call, First = 4, Last = 0. We want this to return 4..5, not 4..0. > I considered an alternative wording where the string would act like > was truncated at From. That's what you want. >... That however has the effect of possibly splitting tokens, which >seems weird. I don't think it's a problem. >...But > the defined semantics is weird, too. This seems like a possible reason >why we didn't add From to this routine in the first place. > > I have no idea which of these semantics is right. Index doesn't care > what precedes From, so it is no help. It is help -- Find_Token should also not care what precedes From. Note that the existing Find_Token doesn't care what precedes Source'First. It's the "immediately before and after" wording that is misleading you -- there's nothing immediately before Source'First. > Returning "45" from the string " 345 " seems wrong. OTOH, the > similar > call: > Find_Token (Source(4..7), Set, Test => Outside, First => First, Last > => Last); does return First = 4, Last = 5. > > Thoughts? And if you want the alternative wording, how would you word it? See my suggestion above. >...I had enough trouble getting the above wording to make sense. Pascal Obry started all this. I suggest you verify the wording with him. I don't know how to do that, since we (annoyingly) don't allow cc's, as if only ARG members may have relevant expertise. **************************************************************** From: Randy Brukardt Sent: Friday, February 12, 2010 1:50 PM ... > Can't we just say, "equivalent to > Find_Token(Source(From..Source'Last), > Set, Test, First, Last)"? Not if we want to be consistent. All of the Index routines have the version with a From parameter as the one that does the defining. I don't know why I did that in hindsight, but too late now. Anyway, part of this AI is to change the existing Find_Token to say: Equivalent to Find_Token (Source, Set, Source'First, Test, First, Last). So the above suggestion would look pretty silly. :-) > Or, "does the same thing as the previous Find_Token procedure, passing > Source(From..Source'Last) as the Source parameter". And so would this. Besides, the version with From comes first (the current one follows it). This again follows what I did with Index. > > Unfortunately, it's not clear this is what is intended. > > Indeed, it's clear that it's not what is intended. ;-) I still disagree. The invariant is that the longest token is returned. The only reason that characters before Source'First are ignored is because they don't exist. When this happens in our compiler, I work both ways from the starting point to ensure that the entire token is determined correctly. (I forget the case where that came up; it probably was related to error handling or debugging or something like that where the source address is not necessarily accurately known.) > >...What is supposed to > > happen if From is in the middle of a token? > > It usually won't be, because the intended use is to loop repeatedly > finding tokens. True enough. But that doesn't give us the ability to ignore what will happen in that case. > But if it is, ignore that fact -- you don't want it to look at any > character before From. You told it to start searching at From, so > that's what it should do. I'm still dubious, although perhaps this is a don't care case, and maybe the equivalence with a slice is felt to be compelling. I don't find it compelling because it ruins the invariant. ... > > I have no idea which of these semantics is right. Index doesn't care > > what precedes From, so it is no help. > > It is help -- Find_Token should also not care what precedes From. > Note that the existing Find_Token doesn't care what precedes > Source'First. It's the "immediately before and after" > wording that is misleading you -- there's nothing immediately before > Source'First. I don't think that's "misleading" me; it defines the invariant of the routine. You're proposing to abandon that invariant and substitute another. Maybe that's OK, but only because it isn't the intended use of the routine. > > Returning "45" from the string " 345 " seems wrong. OTOH, the > > similar call: > > Find_Token (Source(4..7), Set, Test => Outside, First => First, Last > > => Last); does return First = 4, Last = 5. > > > > Thoughts? And if you want the alternative wording, how > would you word it? > > See my suggestion above. Doesn't work. Please try again. :-) > >...I > > had enough trouble getting the above wording to make sense. > > Pascal Obry started all this. I suggest you verify the wording with > him. > I don't know how to do that, since we (annoyingly) don't allow cc's, > as if only ARG members may have relevant expertise. I find this to be an ARG-level angels on a pinhead sort of discussion. Real users don't care about the exact wording, they only care that it does what they expect in the normal case. Which either semantics will do. So I doubt that there is an opinion here. And in any case, there isn't an absolute ban on ccs. They're allowed to individual users (*not* mailing lists) when they're directly relevant to the discussion. The bigger problem with them is that people forget to continue them, or forget to cc the list. So it's better to avoid them, and in any case, real technical discussion belongs on Ada-Comment if the public is to be involved. But I didn't think the public would care about this angels-on-the-head-of-a-pin discussion. **************************************************************** From: Bob Duff Sent: Friday, February 12, 2010 2:56 PM It would be good if other ARG members would weigh in on this earth-shattering issue. ;-) ... > Not if we want to be consistent. All of the Index routines have the > version with a From parameter as the one that does the defining. I > don't know why I did that in hindsight, but too late now. > > Anyway, part of this AI is to change the existing Find_Token to say: > > Equivalent to Find_Token (Source, Set, Source'First, Test, First, Last). > > So the above suggestion would look pretty silly. :-) So don't do that. ;-) We can define one in terms of the other, or the other in terms of the one. I agree it would be nice to be consistent with Index, but it wouldn't be the end of the world to do it the other way around. > I still disagree. The invariant is that the longest token is returned. The longest one starting at From. It seems clear to me that if you specify From, you don't want to look at characters before From. I can't imagine why you think otherwise, so I don't know how to argue against that -- it just seems obvious to me that From indicates the starting point of the search. > I'm still dubious, although perhaps this is a don't care case, and > maybe the equivalence with a slice is felt to be compelling. I don't > find it compelling because it ruins the invariant. I don't think it's a don't care case. And I don't see why you want to apply the invariant for the old Find_Token to the new one. (Don't you mean "postcondition", not "invariant"?) > Doesn't work. Please try again. :-) It works. You just don't like it because it's inconsistent. > I find this to be an ARG-level angels on a pinhead sort of discussion. > Real users don't care about the exact wording, they only care that it > does what they expect in the normal case. Which either semantics will > do. So I doubt that there is an opinion here. We're not just arguing about wording. We're also arguing about what it should do. I think users care about that. OK, if you insist on consistency, define the semantics of the new one like this: First is the index of the first character in Source(From..Source'Last) that satisfies the Test condition. Last is the largest index such that all characters in Source(First..Last) satisfy the Test condition. If no characters in Source(From..Source'Last) satisfy the Test condition, First is From, and Last is 0. The last part, about C_E is no longer needed. And the old one like this (as you said): Equivalent to Find_Token (Source, Set, Source'First, Test, First, Last). [AARM Note: If Source'First is not in Positive, which can only happen for an empty string, this will raise Constraint_Error.] This wording reflects my (obviously correct! ;-)) opinion about what the new one with From should do, and it doesn't change what the old one without From does. If you don't agree on the behavior, you won't like my wording. **************************************************************** From: Steve Baird Sent: Friday, February 12, 2010 3:14 PM > It would be good if other ARG members would weigh in on this > earth-shattering issue. ;-) It looks like you two are converging nicely on a solution. On the main question of whether this form is equivalent to passing in a slice and on how the function should behave, I agree with Bob. And I agree with Randy (and it sounds like Bob does too, now that Randy has identified the issue) about the need for consistency in the wording and for avoiding circular definitions. **************************************************************** From: Tucker Taft Sent: Friday, February 12, 2010 3:19 PM I agree with Bob that if you specify From, it is as though the characters before From don't exist at all. You shouldn't be looking at them. You use an operation like this to walk your way through a string. You wouldn't want a token returned from a second call to overlap the token returned from the first call, presuming you set "From" to one past the end of the first token returned. **************************************************************** From: Jean-Pierre Rosen Sent: Friday, February 12, 2010 3:33 PM > I still disagree. The invariant is that the longest token is returned. > The only reason that characters before Source'First are ignored is > because they don't exist. When this happens in our compiler, I work > both ways from the starting point to ensure that the entire token is > determined correctly. (I forget the case where that came up; it > probably was related to error handling or debugging or something like > that where the source address is not necessarily accurately known.) Here is an example: I parse a command line, and there is a -o option to redirect output. Following stupid Unix convention, no space between -o and file name. My scanner would go like this: 1) A '-': this is an option 2) A 'o': let's get the rest of the string up to the first space. Clearly, I want to get the rest of the token after the 'o'. *I* decide where the real token starts. ****************************************************************