CVS difference for ai05s/ai05-0031-1.txt

Differences between 1.3 and version 1.4
Log of other versions for file ai05s/ai05-0031-1.txt

--- ai05s/ai05-0031-1.txt	2010/02/12 05:24:21	1.3
+++ ai05s/ai05-0031-1.txt	2010/02/16 03:02:06	1.4
@@ -1,4 +1,4 @@
-!standard A.4.3(16)                                10-02-11    AI05-0031-1/01
+!standard A.4.3(16)                                10-02-15    AI05-0031-1/02
 !standard A.4.3(67)
 !standard A.4.3(68/1)
 !standard A.4.4(51)
@@ -52,19 +52,21 @@
                          First  : out Positive;
                          Last   : out Natural);
 
+If From is not in Source'Range, then Index_Error is raised. Otherwise,
+First is set to the index of the first character in Source(From..Source'Last)
+that satisfies the Test condition. Last is set to the largest index such that
+all characters in Source(First..Last) satisfy the Test condition. If no
+characters in Source(From..Source'Last) satisfy the Test condition, First
+is set to From, and Last is set to 0.
 
-Find_Token returns in First and Last the indices of the beginning and end of the
-first slice of Source where First >= From, all of the elements of the slice
-satisfy the Test condition, and such that the elements (if any) immediately
-before and after the slice do not satisfy the Test condition. If no such slice
-exists, then the value returned for Last is zero, and the value returned for
-First is From; however, if From is not in Positive then Constraint_Error is
-raised.
-
 Replace A.4.3(68/1) by:
 
 Equivalent to Find_Token (Source, Set, Source'First, Test, First, Last).
 
+AARM Ramification:
+If Source'First is not in Positive, which can only happen for an empty string,
+this will raise Constraint_Error.
+
 Add before A.4.4(51):
 
    procedure Find_Token (Source : in Bounded_String;
@@ -89,30 +91,28 @@
 packages should have similar capabilities. Note we add the new routines
 before the old ones to be consistent with how that was done for function
 Index.
-
-** Possible bug **
 
-Consider the following:
+We reworded the definition of Find_Token to be simpler and to make it clear
+that the longest possible slice starting at From is returned. That is not
+exactly the same as ensuring that the character immediately before the
+returned slice does not satisfy the test condition. To see the difference,
+consider the following:
 
 Source(1..7) := "1234567";
 Set := "345";
 Find_Token (Source, Set, From => 3, Test => Inside, First => First, Last => Last);
 -- After this call, First = 3, Last = 5.
 Find_Token (Source, Set, From => 4, Test => Inside, First => First, Last => Last);
---  4, Last = 0.
-
-The latter result requires some explanation. The wording requires three things
-about a slice: First >= From, all the elements in the slice satisfy the test,
-*and* the elements immediately before and after the slice do not satisfy the
-test condition. The slice 3 .. 5 fails the first test, and the slice 4 .. 5
-fails that last test (because the element at From-1 *does* satisfy the test).
+-- After this call, First = 4, Last = 5.
 
-I considered an alternative wording where the string would act like was
-truncated at From. That however has the effect of possibly splitting tokens,
-which seems weird. But the defined semantics is weird, too. This seems like a
-possible reason why we didn't add From to this routine in the first place.
+In the latter result, the character at (3) in the source string does meet the
+condition, but that fact is ignored because From is greater than 3.
 
-** Decide on proper semantics **
+This interpretation makes
+Find_Token (Source(From..Source'Last), Set, Test, First, Last) give the same
+results as
+Find_Token (Source, Set, From, Test, First, Last);
+which seems to be the most natural interpretation.
 
 !example
 
@@ -784,5 +784,327 @@
 
 Thoughts? And if you want the alternative wording, how would you word it? I had
 enough trouble getting the above wording to make sense.
+
+****************************************************************
+
+From: Bob Duff
+Sent: Friday, February 12, 2010  8:41 AM
+
+> Find_Token returns in First and Last the indices of the beginning and
+> end of the first slice of Source where First >= From, all of the
+> elements of the slice satisfy the Test condition, and such that the
+> elements (if any) immediately before and after the slice do not
+> satisfy the Test condition. If no such slice exists, then the value
+> returned for Last is zero, and the value returned for First is From;
+> however, if From is not in Positive then Constraint_Error is raised.
+
+The existing wording "immediately before and after" is kind of bogus, IMHO.
+"Immediately before" is implied by "first slice".  And the point of "after" is
+to require it to be the longest such slice.  I would have worded it using
+"longest such slice" or something like that, but I don't suggest we change it
+now.
+
+The "if From is not in Positive" part goes without saying, because the subtype
+of From is Positive.
+
+Can't we just say, "equivalent to Find_Token(Source(From..Source'Last),
+Set, Test, First, Last)"?
+
+Or, "does the same thing as the previous Find_Token procedure, passing
+Source(From..Source'Last) as the Source parameter".
+
+> Unfortunately, it's not clear this is what is intended.
+
+Indeed, it's clear that it's not what is intended.  ;-)
+
+>...What is supposed to
+> happen if From is in the middle of a token?
+
+It usually won't be, because the intended use is to loop repeatedly finding
+tokens.
+
+But if it is, ignore that fact -- you don't want it to look at any character
+before From.  You told it to start searching at From, so that's what it should
+do.
+
+> Source(1..7) := "  345  ";
+> Set := " ";
+> Find_Token (Source, Set, From => 3, Test => Outside, First => First,
+> Last => Last);
+> -- After this call, First = 3, Last = 5.
+
+Right.
+
+> Find_Token (Source, Set, From => 4, Test => Outside, First => First,
+> Last => Last);
+> -- After this call, First = 4, Last = 0.
+
+We want this to return 4..5, not 4..0.
+
+> I considered an alternative wording where the string would act like
+> was truncated at From.
+
+That's what you want.
+
+>... That however has the effect of possibly splitting tokens, which
+>seems  weird.
+
+I don't think it's a problem.
+
+>...But
+> the defined semantics is weird, too. This seems like a possible reason
+>why  we didn't  add From to this routine in the first place.
+>
+> I have no idea which of these semantics is right. Index doesn't care
+> what precedes From, so it is no help.
+
+It is help -- Find_Token should also not care what precedes From.
+Note that the existing Find_Token doesn't care what precedes Source'First.  It's
+the "immediately before and after" wording that is misleading you -- there's
+nothing immediately before Source'First.
+
+> Returning "45" from the string "  345  " seems wrong. OTOH, the
+> similar
+> call:
+> Find_Token (Source(4..7), Set, Test => Outside, First => First, Last
+> => Last); does return First = 4, Last = 5.
+>
+> Thoughts? And if you want the alternative wording, how would you word it?
+
+See my suggestion above.
+
+>...I had enough trouble getting the above wording to make sense.
+
+Pascal Obry started all this.  I suggest you verify the wording with him.
+I don't know how to do that, since we (annoyingly) don't allow cc's, as if only
+ARG members may have relevant expertise.
+
+****************************************************************
+
+From: Randy Brukardt
+Sent: Friday, February 12, 2010  1:50 PM
+
+...
+> Can't we just say, "equivalent to
+> Find_Token(Source(From..Source'Last),
+> Set, Test, First, Last)"?
+
+Not if we want to be consistent. All of the Index routines have the version with
+a From parameter as the one that does the defining. I don't know why I did that
+in hindsight, but too late now.
+
+Anyway, part of this AI is to change the existing Find_Token to say:
+
+   Equivalent to Find_Token (Source, Set, Source'First, Test, First, Last).
+
+So the above suggestion would look pretty silly. :-)
+
+> Or, "does the same thing as the previous Find_Token procedure, passing
+> Source(From..Source'Last) as the Source parameter".
+
+And so would this. Besides, the version with From comes first (the current one
+follows it). This again follows what I did with Index.
+
+> > Unfortunately, it's not clear this is what is intended.
+>
+> Indeed, it's clear that it's not what is intended.  ;-)
+
+I still disagree. The invariant is that the longest token is returned. The only
+reason that characters before Source'First are ignored is because they don't
+exist. When this happens in our compiler, I work both ways from the starting
+point to ensure that the entire token is determined correctly. (I forget the
+case where that came up; it probably was related to error handling or debugging
+or something like that where the source address is not necessarily accurately
+known.)
+
+> >...What is supposed to
+> > happen if From is in the middle of a token?
+>
+> It usually won't be, because the intended use is to loop repeatedly
+> finding tokens.
+
+True enough. But that doesn't give us the ability to ignore what will happen in
+that case.
+
+> But if it is, ignore that fact -- you don't want it to look at any
+> character before From.  You told it to start searching at From, so
+> that's what it should do.
+
+I'm still dubious, although perhaps this is a don't care case, and maybe the
+equivalence with a slice is felt to be compelling. I don't find it compelling
+because it ruins the invariant.
+
+...
+> > I have no idea which of these semantics is right. Index doesn't care
+> > what precedes From, so it is no help.
+>
+> It is help -- Find_Token should also not care what precedes From.
+> Note that the existing Find_Token doesn't care what precedes
+> Source'First.  It's the "immediately before and after"
+> wording that is misleading you -- there's nothing immediately before
+> Source'First.
+
+I don't think that's "misleading" me; it defines the invariant of the routine.
+You're proposing to abandon that invariant and substitute another. Maybe that's
+OK, but only because it isn't the intended use of the routine.
+
+> > Returning "45" from the string "  345  " seems wrong. OTOH, the
+> > similar call:
+> > Find_Token (Source(4..7), Set, Test => Outside, First => First, Last
+> > => Last); does return First = 4, Last = 5.
+> >
+> > Thoughts? And if you want the alternative wording, how
+> would you word it?
+>
+> See my suggestion above.
+
+Doesn't work. Please try again. :-)
+
+> >...I
+> > had enough trouble getting the above wording to make sense.
+>
+> Pascal Obry started all this.  I suggest you verify the wording with
+> him.
+> I don't know how to do that, since we (annoyingly) don't allow cc's,
+> as if only ARG members may have relevant expertise.
+
+I find this to be an ARG-level angels on a pinhead sort of discussion. Real
+users don't care about the exact wording, they only care that it does what they
+expect in the normal case. Which either semantics will do. So I doubt that there
+is an opinion here.
+
+And in any case, there isn't an absolute ban on ccs. They're allowed to
+individual users (*not* mailing lists) when they're directly relevant to the
+discussion. The bigger problem with them is that people forget to continue them,
+or forget to cc the list. So it's better to avoid them, and in any case, real
+technical discussion belongs on Ada-Comment if the public is to be involved. But
+I didn't think the public would care about this angels-on-the-head-of-a-pin
+discussion.
+
+****************************************************************
+
+From: Bob Duff
+Sent: Friday, February 12, 2010  2:56 PM
+
+It would be good if other ARG members would weigh in on this earth-shattering
+issue.  ;-)
+
+...
+> Not if we want to be consistent. All of the Index routines have the
+> version with a From parameter as the one that does the defining. I
+> don't know why I did that in hindsight, but too late now.
+>
+> Anyway, part of this AI is to change the existing Find_Token to say:
+>
+>    Equivalent to Find_Token (Source, Set, Source'First, Test, First, Last).
+>
+> So the above suggestion would look pretty silly. :-)
+
+So don't do that.  ;-)
+
+We can define one in terms of the other, or the other in terms of the one.  I
+agree it would be nice to be consistent with Index, but it wouldn't be the end
+of the world to do it the other way around.
+
+> I still disagree. The invariant is that the longest token is returned.
+
+The longest one starting at From.
+
+It seems clear to me that if you specify From, you don't want to look at
+characters before From. I can't imagine why you think otherwise, so I don't know
+how to argue against that -- it just seems obvious to me that From indicates the
+starting point of the search.
+
+> I'm still dubious, although perhaps this is a don't care case, and
+> maybe the equivalence with a slice is felt to be compelling. I don't
+> find it compelling because it ruins the invariant.
+
+I don't think it's a don't care case.
+
+And I don't see why you want to apply the invariant for the old Find_Token to
+the new one. (Don't you mean "postcondition", not "invariant"?)
+
+> Doesn't work. Please try again. :-)
+
+It works.  You just don't like it because it's inconsistent.
+
+> I find this to be an ARG-level angels on a pinhead sort of discussion.
+> Real users don't care about the exact wording, they only care that it
+> does what they expect in the normal case. Which either semantics will
+> do. So I doubt that there is an opinion here.
+
+We're not just arguing about wording.  We're also arguing about what it should
+do.  I think users care about that.
+
+OK, if you insist on consistency, define the semantics of the new one like this:
+
+    First is the index of the first character in Source(From..Source'Last) that
+    satisfies the Test condition.  Last is the largest index such that all
+    characters in Source(First..Last) satisfy the Test condition.  If no
+    characters in Source(From..Source'Last) satisfy the Test condition, First
+    is From, and Last is 0.
+
+The last part, about C_E is no longer needed.
+
+And the old one like this (as you said):
+
+    Equivalent to Find_Token (Source, Set, Source'First, Test, First, Last).
+    [AARM Note: If Source'First is not in Positive, which can
+    only happen for an empty string, this will raise Constraint_Error.]
+
+This wording reflects my (obviously correct! ;-)) opinion about what the new one
+with From should do, and it doesn't change what the old one without From does.
+If you don't agree on the behavior, you won't like my wording.
+
+****************************************************************
+
+From: Steve Baird
+Sent: Friday, February 12, 2010  3:14 PM
+
+> It would be good if other ARG members would weigh in on this
+> earth-shattering issue.  ;-)
+
+It looks like you two are converging nicely on a solution. On the main question
+of whether this form is equivalent to passing in a slice and on how the function
+should behave, I agree with Bob.
+
+And I agree with Randy (and it sounds like Bob does too, now that Randy has
+identified the issue) about the need for consistency in the wording and for
+avoiding circular definitions.
+
+****************************************************************
+
+From: Tucker Taft
+Sent: Friday, February 12, 2010  3:19 PM
+
+I agree with Bob that if you specify From, it is as though the characters before
+From don't exist at all.  You shouldn't be looking at them.
+
+You use an operation like this to walk your way through a string.  You wouldn't
+want a token returned from a second call to overlap the token returned from the
+first call, presuming you set "From" to one past the end of the first token
+returned.
+
+****************************************************************
+
+From: Jean-Pierre Rosen
+Sent: Friday, February 12, 2010  3:33 PM
+
+> I still disagree. The invariant is that the longest token is returned.
+> The only reason that characters before Source'First are ignored is
+> because they don't exist. When this happens in our compiler, I work
+> both ways from the starting point to ensure that the entire token is
+> determined correctly. (I forget the case where that came up; it
+> probably was related to error handling or debugging or something like
+> that where the source address is not necessarily accurately known.)
+
+Here is an example: I parse a command line, and there is a -o option to redirect
+output. Following stupid Unix convention, no space between -o and file name.
+
+My scanner would go like this:
+1) A '-': this is an option
+2) A 'o': let's get the rest of the string up to the first space.
+Clearly, I want to get the rest of the token after the 'o'. *I* decide where the
+real token starts.
 
 ****************************************************************

Questions? Ask the ACAA Technical Agent