Version 1.4 of ai05s/ai05-0031-1.txt

Unformatted version of ai05s/ai05-0031-1.txt version 1.4
Other versions for file ai05s/ai05-0031-1.txt

!standard A.4.3(16)          10-02-15 AI05-0031-1/02
!standard A.4.3(67)
!standard A.4.3(68/1)
!standard A.4.4(51)
!standard A.4.5(46)
!class Amendment 06-11-03
!status work item 06-11-03
!status received 06-11-03
!priority Low
!difficulty Easy
!subject Add a From parameter to Find_Token
!summary
(See proposal.)
!problem
Find_Token in Ada.Strings.Fixed, Ada.Strings.Unbounded and Ada.Strings.Bounded has no version with a From index. A From index (the index where to start looking) has been added into the Index routines in those same packages.
It is important to be able to start in the middle of a long string when iterating to find multiple tokens.
!proposal
Add a From parameter to Find_Token in all of the predefined string packages.
!wording
Add before A.4.3(16):
procedure Find_Token (Source : in String; Set : in Maps.Character_Set; From : in Positive; Test : in Membership; First : out Positive; Last : out Natural);
[Editor's Note: The From parameter in the third Index is placed before the Test : in Membership parameter. I'm not sure why, but I put it in the same place here, to be consistent. The worst thing would be to have it in all different places.]
Add before A.4.3(67):
procedure Find_Token (Source : in String; Set : in Maps.Character_Set; From : in Positive; Test : in Membership; First : out Positive; Last : out Natural);
If From is not in Source'Range, then Index_Error is raised. Otherwise, First is set to the index of the first character in Source(From..Source'Last) that satisfies the Test condition. Last is set to the largest index such that all characters in Source(First..Last) satisfy the Test condition. If no characters in Source(From..Source'Last) satisfy the Test condition, First is set to From, and Last is set to 0.
Replace A.4.3(68/1) by:
Equivalent to Find_Token (Source, Set, Source'First, Test, First, Last).
AARM Ramification: If Source'First is not in Positive, which can only happen for an empty string, this will raise Constraint_Error.
Add before A.4.4(51):
procedure Find_Token (Source : in Bounded_String; Set : in Maps.Character_Set; From : in Positive; Test : in Membership; First : out Positive; Last : out Natural);
Add before A.4.5(46):
procedure Find_Token (Source : in Unbounded_String; Set : in Maps.Character_Set; From : in Positive; Test : in Membership; First : out Positive; Last : out Natural);
!discussion
This is a consistency change; all of the searching routines in the Strings packages should have similar capabilities. Note we add the new routines before the old ones to be consistent with how that was done for function Index.
We reworded the definition of Find_Token to be simpler and to make it clear that the longest possible slice starting at From is returned. That is not exactly the same as ensuring that the character immediately before the returned slice does not satisfy the test condition. To see the difference, consider the following:
Source(1..7) := "1234567"; Set := "345"; Find_Token (Source, Set, From => 3, Test => Inside, First => First, Last => Last); -- After this call, First = 3, Last = 5. Find_Token (Source, Set, From => 4, Test => Inside, First => First, Last => Last); -- After this call, First = 4, Last = 5.
In the latter result, the character at (3) in the source string does meet the condition, but that fact is ignored because From is greater than 3.
This interpretation makes Find_Token (Source(From..Source'Last), Set, Test, First, Last) give the same results as Find_Token (Source, Set, From, Test, First, Last); which seems to be the most natural interpretation.
!example
!ACATS test
!appendix

From: Pascal Obry
Sent: Monday, October 30, 2006  2:21 PM

I just noticed that the Find_Token in (Fixed, Unbounded and Bounded) has
no version with a From index. This is especially important when
iterating over a long string to find multiple token. Such From index
(index where to start looking) has been added into the Index routines.
Why not for Find_Token ?

The only version is:

   procedure Find_Token (Source : in Unbounded_String;
                         Set    : in Maps.Character_Set;
                         Test   : in Membership;
                         First  : out Positive;
                         Last   : out Natural);

I would like to propose this :

   procedure Find_Token (Source : in Unbounded_String;
                         Set    : in Maps.Character_Set;
                         Test   : in Membership;
                         From   : in Positive;
                         First  : out Positive;
                         Last   : out Natural);

From being here the starting position to look for the given token. An
alternate solution could be to use First:

   procedure Find_Token (Source : in Unbounded_String;
                         Set    : in Maps.Character_Set;
                         Test   : in Membership;
                         First  : in out Positive;
                         Last   : out Natural);


In this case the First parameter is changed to mode "in out", the
initial value being the starting position to look for the given token.
This last solution looks better to me.

Thoughts ?

****************************************************************

From: Adam Beneschan
Sent: Friday, November 3, 2006  1:07 PM

> I just noticed that the Find_Token in (Fixed, Unbounded and Bounded) has
> no version with a From index. This is especially important when
> iterating over a long string to find multiple token. Such From index
> (index where to start looking) has been added into the Index routines.
> Why not for Find_Token ?

I just checked and Find_Token is not mentioned at all in AI-301
(including all of the e-mail).  Looks to me like nobody else noticed
it.  I think you're right, this is an omission.

> The only version is:
>
>    procedure Find_Token (Source : in Unbounded_String;
>                          Set    : in Maps.Character_Set;
>                          Test   : in Membership;
>                          First  : out Positive;
>                          Last   : out Natural);
>
> I would like to propose this :
>
>    procedure Find_Token (Source : in Unbounded_String;
>                          Set    : in Maps.Character_Set;
>                          Test   : in Membership;
>                          From   : in Positive;
>                          First  : out Positive;
>                          Last   : out Natural);
>
> >From being here the starting position to look for the given token. An
> alternate solution could be to use First:
>
>    procedure Find_Token (Source : in Unbounded_String;
>                          Set    : in Maps.Character_Set;
>                          Test   : in Membership;
>                          First  : in out Positive;
>                          Last   : out Natural);
>
>
> In this case the First parameter is changed to mode "in out", the
> initial value being the starting position to look for the given token.
> This last solution looks better to me.
>
> Thoughts ?

I definitely like the first solution (separate From and First
parameters) better.  If the second solution were adopted, I think a
call to it would look confusing, since the parameter would have to be
a variable used for one meaning before the call and a different
(although vaguely similar) meaning after the call.  Anyway, I've seen
code that calls routines like that and I always end up scratching my
head trying to figure out what the heck is going on.

****************************************************************

From: Randy Brukardt
Sent: Friday, November 3, 2006  11:28 PM

> I just checked and Find_Token is not mentioned at all in AI-301
> (including all of the e-mail).  Looks to me like nobody else noticed
> it.

I'm not sure that anyone knows that Find_Token exists or what it does. So
it's not surprising that it didn't immediately come to mind. Anyway, I think
you could make the argument that the "From" parameter is useful for pretty
much all of the Unbounded string routines, but it is really easy for that to
turn into feeping creaturism. (It's hard to find much use for most of the
Unbounded string routines anyway.) So where do you draw the line?

I suspect that adding much more to AI-301 would have killed it (it was a
tough sell originally), so I think it was best that Find_Token was left out.
That doesn't mean that we shouldn't think about adding it in the future.

****************************************************************

From: Pascal Obry
Sent: Monday, November 6, 2006 12:43 AM

> I'm not sure that anyone knows that Find_Token exists or what it does. So
> it's not surprising that it didn't immediately come to mind. Anyway, I think
> you could make the argument that the "From" parameter is useful for pretty
> much all of the Unbounded string routines, but it is really easy for that to

Why all? Apart Index and Find_Token that can be use repeatedly to look
for some patterns in a string I don't see the need for others.

> turn into feeping creaturism. (It's hard to find much use for most of the
> Unbounded string routines anyway.) So where do you draw the line?

Hard to find much use ? Ok, I must be different then :) Frankly this is
quite a nice addition to Ada95, and there is services in
Ada.Strings.Unbounded that I used all the time! I definitely think that
improving it is very important, hence my Find_Token proposal. Better the
interface will be more it will be used!

The solution to my problem today is to convert the unbounded_string to a
string and to take successive slice to pass to Find_Token. This is not
acceptable for a language like Ada!

****************************************************************

From: Jeffrey Carter
Sent: Monday, November 6, 2006  2:22 PM

> The solution to my problem today is to convert the unbounded_string to a
> string and to take successive slice to pass to Find_Token. This is not
> acceptable for a language like Ada!

Why not use Ada.Strings.Unbounded.Slice?

****************************************************************

From: Pascal Obry
Sent: Monday, November 6, 2006  2:36 PM

Performance?

****************************************************************

From: Jeffrey Carter
Sent: Monday, November 6, 2006  8:18 PM

Then you probably shouldn't be using Ada.Strings.Unbounded.

****************************************************************

From: Pascal Obry
Sent: Tuesday, November 7, 2006  1:33 AM

That's not because unbounded strings are slower than standard strings
that I must be ok to use an even worst implementation of Find_Token.
Dealing with unbounded strings directly is ok, the conversion from/back
to string hit performance. I want to avoid that.

Note also that with a good cache, the unbounded strings are not that
slow. See the GNAT implementation for example.

And we are speaking of a very simple addition, looks worth it to me.

****************************************************************

From: Randy Brukardt
Sent: Tuesday, November 7, 2006  6:25 PM

> That's not because unbounded strings are slower than standard strings
> that I must be ok to use an even worst implementation of Find_Token.
> Dealing with unbounded strings directly is ok, the conversion from/back
> to string hit performance. I want to avoid that.

Then Jeff is right. To use the unbounded strings package requires lots of
conversions back and forth, simply because most of the operations in the
unbounded strings package take String, not Unbounded_String, arguments. For
instance, my spam filter does a lot of searching for patterns (stored as
lists of unbounded strings) in messages (stored as lists of unbounded
strings). The patterns have to be converted to strings on every use - ugh.
[Yes, I could have stored the patterns as regular strings, but then I'd have
to do a lot of memory management on the lists of patterns. And if I did
that, I would necessarily convert the messages (since they're stored in the
same type) to regular strings as well -- and I wouldn't use unbounded
strings at all.]

So if you need maximum performance, you can't use unbounded strings. If the
memory management aspects are more important to you than performance, then
the extra conversions cannot be a big deal. You can't have it both ways
(mainly because Ada doesn't have a way to give string literals to private
types -- but even if it did, you'd need a package quite different than
unbounded strings).

> Note also that with a good cache, the unbounded strings are not that
> slow. See the GNAT implementation for example.
>
> And we are speaking of a very simple addition, looks worth it to me.

But remember that any change to the standard packages is (potentially)
incompatible. We need a strong justification to introduce incompatibilies.
We took a somewhat weaker hurdle for incompatibilities in the Amendment,
because it represented a major update and we expected users to be
unsurprised about minor glitches from rare incompatibilies.

Note that we do *not* allow these new routines in Ada 95 implementations.
That's specifically because of the compatibility concerns -- we do not want
programs that work on one Ada 95 compiler to fail on another because of the
presence or absence of these new routines.

But the Amendment is done now, and it is in use (at least with GNAT).
Changes now have a higher burden. Of course, if there is an actual bug
(wrong mode, wrong type, etc.), that should be fixed, but we're not in the
business of making changes that might break real, existing programs simply
because it seems inconsistent and it is a "simple addition".

If this had comment had been made a year ago while the Amendment was still
being finalized, the change might very well have been made. But the
Amendment is frozen (and mostly approved) and in use. In my opinion,
nice-to-haves have to wait for the next revision/Amendment. Whenever that
is.

****************************************************************

From: Jeffrey Carter
Sent: Tuesday, November 7, 2006  8:11 PM

> Then Jeff is right. To use the unbounded strings package requires lots of
> conversions back and forth, simply because most of the operations in the
> unbounded strings package take String, not Unbounded_String, arguments. For
> instance, my spam filter does a lot of searching for patterns (stored as
> lists of unbounded strings) in messages (stored as lists of unbounded
> strings). The patterns have to be converted to strings on every use - ugh.
> [Yes, I could have stored the patterns as regular strings, but then I'd have
> to do a lot of memory management on the lists of patterns. And if I did
> that, I would necessarily convert the messages (since they're stored in the
> same type) to regular strings as well -- and I wouldn't use unbounded
> strings at all.]

What he said.

> If this had comment had been made a year ago while the Amendment was still
> being finalized, the change might very well have been made. But the
> Amendment is frozen (and mostly approved) and in use. In my opinion,
> nice-to-haves have to wait for the next revision/Amendment. Whenever that
> is.

My guess is 2019.

****************************************************************

From: John Barnes
Sent: Wednesday, November 8, 2006  1:27 AM

> So if you need maximum performance, you can't use unbounded strings. If the
> memory management aspects are more important to you than performance, then
> the extra conversions cannot be a big deal. You can't have it both ways
> (mainly because Ada doesn't have a way to give string literals to private
> types -- but even if it did, you'd need a package quite different than
> unbounded strings).

One of the features that Tuck proposed when doing Ada 9x was to allow the
definition of literals for private types. I thought it was a wonderful idea
and still miss it. But it was killed at an early stage.

A thought for Ada 2016?

****************************************************************

From: Christoph Grein
Sent: Wednesday, November 8, 2006  2:07 AM

Why not, but how would those literals be different from enums? We
already have a kind of such "literals" as parameterless functions
returning objects of the private type.

How could we define "string literals" (or aggregates) for private types?

What kind of literals are envisaged after all?

****************************************************************

From: Robert A. Duff
Sent: Wednesday, November 8, 2006  3:12 PM

> How could we define "string literals" (or aggregates) for private types?
>
> What kind of literals are envisaged after all?

The idea is that the programmer provides a function that converts from the
source representation to the type, and this function is implicitly called when
a literal appears in the source code.  Perhaps:

    function My_Literal_Function (X : String) return My_Time_Type;
    for My_Time_Type'Literal use My_Literal_Function;

Then:

    X : My_Time_Type := "June 1, 2006, at 10 o'clock";

would be equivalent to:

    X : My_Time_Type := My_Literal_Function("June 1, 2006, at 10 o'clock");

Or:

    function Lit (X : String) return Bignum;
    for Bignum'Literal use Lit;

    X : Bignum := (2 ** 100) - 1_000_000_000_000_000_000_000_000_000_000;

One could do similar things for record aggregates and extension aggregates.
Array aggregates are tricky.

The overload resolution rules would have to be changed incompatibly.
Currently, in P(123), the 123 can be used to choose a P that takes
Integer over some non-integer type.  That call would have to be
ambiguous.

****************************************************************

From: Alexander E. Kopilovich
Sent: Wednesday, November 8, 2006  9:27 PM

> How could we define "string literals" (or aggregates) for private types?
>
> What kind of literals are envisaged after all?

and Robert A. Duff replies:

>The idea is that the programmer provides a function that converts from the
>source representation to the type, and this function is implicitly called when
>a literal appears in the source code.  Perhaps:
>
>    function My_Literal_Function (X : String) return My_Time_Type;
>    for My_Time_Type'Literal use My_Literal_Function;

Yes, something of this kind I proposed here 3 years ago (and that proposition
received the honorary status "no action" on 03-12-05):

  http://www.ada-auth.org/cgi-bin/cvsweb.cgi/ACs/AC-00090.TXT?rev=1.2

****************************************************************

From: Pascal Obry
Sent: Wednesday, November 8, 2006  3:37 AM

Randy Brukardt a écrit :
> So if you need maximum performance, you can't use unbounded strings. If the
> memory management aspects are more important to you than performance, then
> the extra conversions cannot be a big deal. You can't have it both ways
> (mainly because Ada doesn't have a way to give string literals to private
> types -- but even if it did, you'd need a package quite different than
> unbounded strings).

Looks like I'm not making myself clear.

First of all I'm not searching maximum performance. I'm just trying to
avoid maximum performance degradation. That's quite different to me.

Secondly, I'd like also to point out that if the unbounded_string is
huge, converting to string might not be an option.

Last, I'm not pushing to have this in Ada 2005. I raised an issue and
everybody seems to be working hard to find arguments to dismiss it. Just
to be clear, I'm perfectly fine to have this issue dropped right now or
scheduled for the next amendment.

> But remember that any change to the standard packages is (potentially)
> incompatible. We need a strong justification to introduce incompatibilies.
> We took a somewhat weaker hurdle for incompatibilities in the Amendment,
> because it represented a major update and we expected users to be
> unsurprised about minor glitches from rare incompatibilies.

I understand, in the current case I don't see what kind of
incompatibilities could be introduced.

****************************************************************

From: Randy Brukardt
Sent: Wednesday, November 8, 2006  5:35 PM

...
> Last, I'm not pushing to have this in Ada 2005. I raised an issue and
> everybody seems to be working hard to find arguments to dismiss it. Just
> to be clear, I'm perfectly fine to have this issue dropped right now or
> scheduled for the next amendment.

Oh, OK. I naturally assumed that you were looking for a change sooner than
10 years from now, as we're not intentionally looking for new Amendment
ideas now. (Of course, they sometimes come up organically, as in the other
thread that's going on now. They'll get filed somewhere for future
reference.)

> > But remember that any change to the standard packages is (potentially)
> > incompatible. We need a strong justification to introduce incompatibilies.
> > We took a somewhat weaker hurdle for incompatibilities in the Amendment,
> > because it represented a major update and we expected users to be
> > unsurprised about minor glitches from rare incompatibilies.
>
> I understand, in the current case I don't see what kind of
> incompatibilities could be introduced.

Pretty much any change to a predefined package can cause problems if the
package is USEd. And it's pretty common to reference the predefined packages
with a use clause. The problem occurs if there is a user-defined routine
with the same name in some package that is used as well. In that case,
adding a new routine can make existing calls ambiguous. Worse, child
packages of Unbounded can have the behavior of their calls changed silently
(the new routine, rather than the user-defined one, would be called, as the
new one would be directly visible and that has priority over any
use-visibility).

Obviously, it's not particularly likely for there to be something called
Find_Token in user code; but my experience is that the names of predefined
routines often get "borrowed" for other purposes (they tend to be good,
simple names, and programmers are familar with them). And, as I said before,
it's not clear that we're willing to have any unnecessary incompatibilities
when we're purely in bug-fixing mode (as opposed to Amendment mode).

****************************************************************

From: Robert A. Duff
Sent: Wednesday, November 8, 2006  2:58 PM

> I understand, in the current case I don't see what kind of
> incompatibilities could be introduced.

Whenever a new subprogram is added to a package, it causes an incompatibility.
In particular, if another subprogram with the same name and profile exists in
some user's package, and both packages have use_clauses, then calls to the
user's subprogram become illegal, due to the name conflict.

But it's hardly a reason to say "never add a subprogram to a predefined
package"!

****************************************************************

From: Robert A. Duff
Sent: Wednesday, November 8, 2006  7:18 PM

> Obviously, it's not particularly likely for there to be something called
> Find_Token in user code; ...

Actually, that's not so obvious.  Pascal wants Find_Token-with-From.
If he doesn't get it from the ARG, I'd say it's quite likely that he
will declare it in his own package!  So if ARG adds it, it _will_
conflict.  Whether he will consider that a bug or a feature is an
interesting question.  ;-)

>...but my experience is that the names of predefined
> routines often get "borrowed" for other purposes...

Well, OK, but if it's "for other purposes", it has a different profile, and
therefore won't conflict.  (Presuming it's overloadable, as is the case for
subprograms.)

>... (they tend to be good,
> simple names, and programmers are familar with them). And, as I said before,
> it's not clear that we're willing to have any unnecessary incompatibilities
> when we're purely in bug-fixing mode (as opposed to Amendment mode).

It's a judgement call.  I have no strong opinion one way or 'tother here.
I don't think the possibility of name conflicts should absolutely rule
out additions to predefined packages.

****************************************************************

From: Randy Brukardt
Sent: Wednesday, November 8, 2006  7:38 PM

> It's a judgement call.  I have no strong opinion one way or 'tother here.
> I don't think the possibility of name conflicts should absolutely rule
> out additions to predefined packages.

Well, I'd agree personally, but the ARG has come down on the side of
compatibility in Ada 95 vs. Ada 2007 changes. As I'm sure you know, GNAT has
pragmas and switches to ensure that the new subprograms are not used by Ada
95 programs -- and that was discussed and required by the ARG. I don't see
how this case (or any other case not involving a clear bug) differs from
that decision - Ada 2007 (or Ada 2005 if you prefer) is frozen now and I
don't think we should be making random incompatible changes other than to
fix bugs.

****************************************************************

From: Pascal Leroy
Sent: Thursday, November 9, 2006  2:07 AM

> It's a judgement call.  I have no strong opinion one way or
> 'tother here. I don't think the possibility of name conflicts
> should absolutely rule out additions to predefined packages.

I am not too concerned about name conflicts (I believe that they are
extremely improbable) but I am concerned about portability.  If we add new
subprograms now, it is not clear if/when they will be incorporated in
compilers.  So programs that use the new and improved Find_Token may not
port.  Not a good thing.

On the other hand, there aren't many compiler technologies left...

****************************************************************

From: Pascal Obry
Sent: Thursday, November 9, 2006  2:56 PM

On the other hand we are talking about a trivial implementation, 5
minutes for the implementation, 15 minutes to add a non regression test!
So I don't see a portability problem here, at least vendors won't have
hard work supporting this.

****************************************************************

From: Randy Brukardt
Sent: Thursday, November 9, 2006  6:54 PM

I'd argue with your numbers (they're several orders of magnitude low), but
they're irrelevant in any case (as we've discussed in the ARG several
times). Vendors don't release new versions of compilers for every 10 minute
change that comes from the ARG! Depending on the vendor, compiler releases
require a lot of QA testing, documentation work, and the like. Often,
release cycles are over a year or more long. Moreover, some vendors (and
most users) only use completed ISO standards for their work (ignoring ARG
rulings in between). Even if this change was adopted at the upcoming ARG
meeting, it would not appear in a published standard for several more years.

So (if adopted now) there would be a period (probably a long period) where
some implementations implemented the change and some did not. This would
cause a portability issue, as Pascal Leroy pointed out. Moreover, it would
mean that cautious users could not use the new routine (and most likely,
many of them would not even know it exists, since it would not appear in the
Standard). This is precisely the situation that the ARG voted to not allow
to happen with Ada 95 compilers vis-a-vis the new Index functions. I don't
see why Ada 2005 compilers should be any different. (Indeed, I would be very
upset if we were to go ahead with this subprogram, but continue to not allow
a similar incompatibility in Ada 95 compilers. The effect of that is to
require a significant amount of work to allow routines in the runtime to be
accessed or invisible depending on a compiler switch -- a *lot* more work
than "5 minutes for the implementation".)

****************************************************************

From: Dan Eilers
Sent: Thursday, November 9, 2006  7:20 PM

> So (if adopted now) there would be a period (probably a long period) where
> some implementations implemented the change and some did not. This would
> cause a portability issue, as Pascal Leroy pointed out.  ...

This portability concern would seem to apply to just about any
non-editorial AI ever considered by the ARG.  Are you suggesting
that the ARG should stop considering non-editorial AI's just because
implementers may implement them at different times?  or is this
particular issue somehow special?

****************************************************************

From: Randy Brukardt
Sent: Thursday, November 9, 2006  7:46 PM

No, of course not. Certainly, the concern doesn't apply to Amendment-class
AIs (because they won't be implemented now, and when they are implemented it
will be as part of a new version of the language). It does apply to all
other AIs. But, most AIs are upwards compatible (while additions/changes to
the standard library are not). For instance, adding the missing wording
that Adam pointed out is a compatible change (it's unlikely that anyone
would have intentionally implemented anything other than the rules for
instantiation, especially as the rules were correct in Ada 95). Those that
aren't fix significant bugs in the Standard or omissions where it is not
clear what an implementer should do. (In the later case, the AI actually
increases compatibility in the long run.)

If there are AIs that don't fit in any of these categories, and they cause
incompatibilities, then they probably should not be adopted (or should be
reclassified as Amendment AIs).

****************************************************************

From: Pascal Leroy
Sent: Friday, November 10, 2006  1:42 AM

> Vendors don't release
> new versions of compilers for every 10 minute change that
> comes from the ARG! Depending on the vendor, compiler
> releases require a lot of QA testing, documentation work, and
> the like. Often, release cycles are over a year or more long.

Not to mention that, once a release is out, users don't rush to adopt it.
We still have users happily using a version that we released in 2000, and
they won't move to more recent stuff for fear of destabilizing their
environment.  These big projects have a huge inertia.

****************************************************************

From: Randy Brukardt
Sent: Thursday, February 11, 2010  11:18 PM

One of my action items was to create wording for the additional Find_Token
routine that we think should be added for consistency.

Here's what I came up with:

   procedure Find_Token (Source : in String;
                         Set    : in Maps.Character_Set;
                         From   : in Positive;
                         Test   : in Membership;
                         First  : out Positive;
                         Last   : out Natural);

Find_Token returns in First and Last the indices of the beginning and end of the
first slice of Source where First >= From, all of the elements of the slice
satisfy the Test condition, and such that the elements (if any) immediately
before and after the slice do not satisfy the Test condition. If no such slice
exists, then the value returned for Last is zero, and the value returned for
First is From; however, if From is not in Positive then Constraint_Error is
raised.

Unfortunately, it's not clear this is what is intended. What is supposed to
happen if From is in the middle of a token? Consider:

Source(1..7) := "  345  ";
Set := " ";
Find_Token (Source, Set, From => 3, Test => Outside, First => First, Last => Last);
-- After this call, First = 3, Last = 5.
Find_Token (Source, Set, From => 4, Test => Outside, First => First, Last => Last);
-- After this call, First = 4, Last = 0.

The latter result requires some explanation. The wording requires three things
about a slice: First >= From, all the elements in the slice satisfy the test,
*and* the elements immediately before and after the slice do not satisfy the
test condition. The slice 3 .. 5 fails the first test, and the slice 4 .. 5
fails that last test (because the element at From-1 *does* satisfy the test).

I considered an alternative wording where the string would act like was
truncated at From. That however has the effect of possibly splitting tokens,
which seems weird. But the defined semantics is weird, too. This seems like a
possible reason why we didn't add From to this routine in the first place.

I have no idea which of these semantics is right. Index doesn't care what
precedes From, so it is no help. Returning "45" from the string "  345  " seems
wrong. OTOH, the similar call: Find_Token (Source(4..7), Set, Test => Outside,
First => First, Last => Last); does return First = 4, Last = 5.

Thoughts? And if you want the alternative wording, how would you word it? I had
enough trouble getting the above wording to make sense.

****************************************************************

From: Bob Duff
Sent: Friday, February 12, 2010  8:41 AM

> Find_Token returns in First and Last the indices of the beginning and
> end of the first slice of Source where First >= From, all of the
> elements of the slice satisfy the Test condition, and such that the
> elements (if any) immediately before and after the slice do not
> satisfy the Test condition. If no such slice exists, then the value
> returned for Last is zero, and the value returned for First is From;
> however, if From is not in Positive then Constraint_Error is raised.

The existing wording "immediately before and after" is kind of bogus, IMHO.
"Immediately before" is implied by "first slice".  And the point of "after" is
to require it to be the longest such slice.  I would have worded it using
"longest such slice" or something like that, but I don't suggest we change it
now.

The "if From is not in Positive" part goes without saying, because the subtype
of From is Positive.

Can't we just say, "equivalent to Find_Token(Source(From..Source'Last),
Set, Test, First, Last)"?

Or, "does the same thing as the previous Find_Token procedure, passing
Source(From..Source'Last) as the Source parameter".

> Unfortunately, it's not clear this is what is intended.

Indeed, it's clear that it's not what is intended.  ;-)

>...What is supposed to
> happen if From is in the middle of a token?

It usually won't be, because the intended use is to loop repeatedly finding
tokens.

But if it is, ignore that fact -- you don't want it to look at any character
before From.  You told it to start searching at From, so that's what it should
do.

> Source(1..7) := "  345  ";
> Set := " ";
> Find_Token (Source, Set, From => 3, Test => Outside, First => First,
> Last => Last);
> -- After this call, First = 3, Last = 5.

Right.

> Find_Token (Source, Set, From => 4, Test => Outside, First => First,
> Last => Last);
> -- After this call, First = 4, Last = 0.

We want this to return 4..5, not 4..0.

> I considered an alternative wording where the string would act like
> was truncated at From.

That's what you want.

>... That however has the effect of possibly splitting tokens, which
>seems  weird.

I don't think it's a problem.

>...But
> the defined semantics is weird, too. This seems like a possible reason
>why  we didn't  add From to this routine in the first place.
>
> I have no idea which of these semantics is right. Index doesn't care
> what precedes From, so it is no help.

It is help -- Find_Token should also not care what precedes From.
Note that the existing Find_Token doesn't care what precedes Source'First.  It's
the "immediately before and after" wording that is misleading you -- there's
nothing immediately before Source'First.

> Returning "45" from the string "  345  " seems wrong. OTOH, the
> similar
> call:
> Find_Token (Source(4..7), Set, Test => Outside, First => First, Last
> => Last); does return First = 4, Last = 5.
>
> Thoughts? And if you want the alternative wording, how would you word it?

See my suggestion above.

>...I had enough trouble getting the above wording to make sense.

Pascal Obry started all this.  I suggest you verify the wording with him.
I don't know how to do that, since we (annoyingly) don't allow cc's, as if only
ARG members may have relevant expertise.

****************************************************************

From: Randy Brukardt
Sent: Friday, February 12, 2010  1:50 PM

...
> Can't we just say, "equivalent to
> Find_Token(Source(From..Source'Last),
> Set, Test, First, Last)"?

Not if we want to be consistent. All of the Index routines have the version with
a From parameter as the one that does the defining. I don't know why I did that
in hindsight, but too late now.

Anyway, part of this AI is to change the existing Find_Token to say:

   Equivalent to Find_Token (Source, Set, Source'First, Test, First, Last).

So the above suggestion would look pretty silly. :-)

> Or, "does the same thing as the previous Find_Token procedure, passing
> Source(From..Source'Last) as the Source parameter".

And so would this. Besides, the version with From comes first (the current one
follows it). This again follows what I did with Index.

> > Unfortunately, it's not clear this is what is intended.
>
> Indeed, it's clear that it's not what is intended.  ;-)

I still disagree. The invariant is that the longest token is returned. The only
reason that characters before Source'First are ignored is because they don't
exist. When this happens in our compiler, I work both ways from the starting
point to ensure that the entire token is determined correctly. (I forget the
case where that came up; it probably was related to error handling or debugging
or something like that where the source address is not necessarily accurately
known.)

> >...What is supposed to
> > happen if From is in the middle of a token?
>
> It usually won't be, because the intended use is to loop repeatedly
> finding tokens.

True enough. But that doesn't give us the ability to ignore what will happen in
that case.

> But if it is, ignore that fact -- you don't want it to look at any
> character before From.  You told it to start searching at From, so
> that's what it should do.

I'm still dubious, although perhaps this is a don't care case, and maybe the
equivalence with a slice is felt to be compelling. I don't find it compelling
because it ruins the invariant.

...
> > I have no idea which of these semantics is right. Index doesn't care
> > what precedes From, so it is no help.
>
> It is help -- Find_Token should also not care what precedes From.
> Note that the existing Find_Token doesn't care what precedes
> Source'First.  It's the "immediately before and after"
> wording that is misleading you -- there's nothing immediately before
> Source'First.

I don't think that's "misleading" me; it defines the invariant of the routine.
You're proposing to abandon that invariant and substitute another. Maybe that's
OK, but only because it isn't the intended use of the routine.

> > Returning "45" from the string "  345  " seems wrong. OTOH, the
> > similar call:
> > Find_Token (Source(4..7), Set, Test => Outside, First => First, Last
> > => Last); does return First = 4, Last = 5.
> >
> > Thoughts? And if you want the alternative wording, how
> would you word it?
>
> See my suggestion above.

Doesn't work. Please try again. :-)

> >...I
> > had enough trouble getting the above wording to make sense.
>
> Pascal Obry started all this.  I suggest you verify the wording with
> him.
> I don't know how to do that, since we (annoyingly) don't allow cc's,
> as if only ARG members may have relevant expertise.

I find this to be an ARG-level angels on a pinhead sort of discussion. Real
users don't care about the exact wording, they only care that it does what they
expect in the normal case. Which either semantics will do. So I doubt that there
is an opinion here.

And in any case, there isn't an absolute ban on ccs. They're allowed to
individual users (*not* mailing lists) when they're directly relevant to the
discussion. The bigger problem with them is that people forget to continue them,
or forget to cc the list. So it's better to avoid them, and in any case, real
technical discussion belongs on Ada-Comment if the public is to be involved. But
I didn't think the public would care about this angels-on-the-head-of-a-pin
discussion.

****************************************************************

From: Bob Duff
Sent: Friday, February 12, 2010  2:56 PM

It would be good if other ARG members would weigh in on this earth-shattering
issue.  ;-)

...
> Not if we want to be consistent. All of the Index routines have the
> version with a From parameter as the one that does the defining. I
> don't know why I did that in hindsight, but too late now.
>
> Anyway, part of this AI is to change the existing Find_Token to say:
>
>    Equivalent to Find_Token (Source, Set, Source'First, Test, First, Last).
>
> So the above suggestion would look pretty silly. :-)

So don't do that.  ;-)

We can define one in terms of the other, or the other in terms of the one.  I
agree it would be nice to be consistent with Index, but it wouldn't be the end
of the world to do it the other way around.

> I still disagree. The invariant is that the longest token is returned.

The longest one starting at From.

It seems clear to me that if you specify From, you don't want to look at
characters before From. I can't imagine why you think otherwise, so I don't know
how to argue against that -- it just seems obvious to me that From indicates the
starting point of the search.

> I'm still dubious, although perhaps this is a don't care case, and
> maybe the equivalence with a slice is felt to be compelling. I don't
> find it compelling because it ruins the invariant.

I don't think it's a don't care case.

And I don't see why you want to apply the invariant for the old Find_Token to
the new one. (Don't you mean "postcondition", not "invariant"?)

> Doesn't work. Please try again. :-)

It works.  You just don't like it because it's inconsistent.

> I find this to be an ARG-level angels on a pinhead sort of discussion.
> Real users don't care about the exact wording, they only care that it
> does what they expect in the normal case. Which either semantics will
> do. So I doubt that there is an opinion here.

We're not just arguing about wording.  We're also arguing about what it should
do.  I think users care about that.

OK, if you insist on consistency, define the semantics of the new one like this:

    First is the index of the first character in Source(From..Source'Last) that
    satisfies the Test condition.  Last is the largest index such that all
    characters in Source(First..Last) satisfy the Test condition.  If no
    characters in Source(From..Source'Last) satisfy the Test condition, First
    is From, and Last is 0.

The last part, about C_E is no longer needed.

And the old one like this (as you said):

    Equivalent to Find_Token (Source, Set, Source'First, Test, First, Last).
    [AARM Note: If Source'First is not in Positive, which can
    only happen for an empty string, this will raise Constraint_Error.]

This wording reflects my (obviously correct! ;-)) opinion about what the new one
with From should do, and it doesn't change what the old one without From does.
If you don't agree on the behavior, you won't like my wording.

****************************************************************

From: Steve Baird
Sent: Friday, February 12, 2010  3:14 PM

> It would be good if other ARG members would weigh in on this
> earth-shattering issue.  ;-)

It looks like you two are converging nicely on a solution. On the main question
of whether this form is equivalent to passing in a slice and on how the function
should behave, I agree with Bob.

And I agree with Randy (and it sounds like Bob does too, now that Randy has
identified the issue) about the need for consistency in the wording and for
avoiding circular definitions.

****************************************************************

From: Tucker Taft
Sent: Friday, February 12, 2010  3:19 PM

I agree with Bob that if you specify From, it is as though the characters before
From don't exist at all.  You shouldn't be looking at them.

You use an operation like this to walk your way through a string.  You wouldn't
want a token returned from a second call to overlap the token returned from the
first call, presuming you set "From" to one past the end of the first token
returned.

****************************************************************

From: Jean-Pierre Rosen
Sent: Friday, February 12, 2010  3:33 PM

> I still disagree. The invariant is that the longest token is returned.
> The only reason that characters before Source'First are ignored is
> because they don't exist. When this happens in our compiler, I work
> both ways from the starting point to ensure that the entire token is
> determined correctly. (I forget the case where that came up; it
> probably was related to error handling or debugging or something like
> that where the source address is not necessarily accurately known.)

Here is an example: I parse a command line, and there is a -o option to redirect
output. Following stupid Unix convention, no space between -o and file name.

My scanner would go like this:
1) A '-': this is an option
2) A 'o': let's get the rest of the string up to the first space.
Clearly, I want to get the rest of the token after the 'o'. *I* decide where the
real token starts.

****************************************************************

Questions? Ask the ACAA Technical Agent