Version 1.2 of ai05s/ai05-0031-1.txt

Unformatted version of ai05s/ai05-0031-1.txt version 1.2
Other versions for file ai05s/ai05-0031-1.txt

!standard A.4.3(16)          06-12-14 AI05-0031-1/00
!standard A.4.3(67)
!standard A.4.3(68/1)
!standard A.4.4(51)
!standard A.4.5(46)
!class Amendment 06-11-03
!status work item 06-11-03
!status received 06-11-03
!priority Low
!difficulty Easy
!subject Add a From parameter to Find_Token
!summary
(See proposal.)
!problem
** See mail **
!proposal
[Add a From parameter to Find_Token in all of the predefined string packages]
!wording
(** TBD **)
!discussion
** TBD **
!example
!ACATS test
!appendix

From: Pascal Obry
Sent: Monday, October 30, 2006  2:21 PM

I just noticed that the Find_Token in (Fixed, Unbounded and Bounded) has
no version with a From index. This is especially important when
iterating over a long string to find multiple token. Such From index
(index where to start looking) has been added into the Index routines.
Why not for Find_Token ?

The only version is:

   procedure Find_Token (Source : in Unbounded_String;
                         Set    : in Maps.Character_Set;
                         Test   : in Membership;
                         First  : out Positive;
                         Last   : out Natural);

I would like to propose this :

   procedure Find_Token (Source : in Unbounded_String;
                         Set    : in Maps.Character_Set;
                         Test   : in Membership;
                         From   : in Positive;
                         First  : out Positive;
                         Last   : out Natural);

From being here the starting position to look for the given token. An
alternate solution could be to use First:

   procedure Find_Token (Source : in Unbounded_String;
                         Set    : in Maps.Character_Set;
                         Test   : in Membership;
                         First  : in out Positive;
                         Last   : out Natural);


In this case the First parameter is changed to mode "in out", the
initial value being the starting position to look for the given token.
This last solution looks better to me.

Thoughts ?

****************************************************************

From: Adam Beneschan
Sent: Friday, November 3, 2006  1:07 PM

> I just noticed that the Find_Token in (Fixed, Unbounded and Bounded) has
> no version with a From index. This is especially important when
> iterating over a long string to find multiple token. Such From index
> (index where to start looking) has been added into the Index routines.
> Why not for Find_Token ?

I just checked and Find_Token is not mentioned at all in AI-301
(including all of the e-mail).  Looks to me like nobody else noticed
it.  I think you're right, this is an omission.

> The only version is:
> 
>    procedure Find_Token (Source : in Unbounded_String;
>                          Set    : in Maps.Character_Set;
>                          Test   : in Membership;
>                          First  : out Positive;
>                          Last   : out Natural);
> 
> I would like to propose this :
> 
>    procedure Find_Token (Source : in Unbounded_String;
>                          Set    : in Maps.Character_Set;
>                          Test   : in Membership;
>                          From   : in Positive;
>                          First  : out Positive;
>                          Last   : out Natural);
> 
> >From being here the starting position to look for the given token. An
> alternate solution could be to use First:
> 
>    procedure Find_Token (Source : in Unbounded_String;
>                          Set    : in Maps.Character_Set;
>                          Test   : in Membership;
>                          First  : in out Positive;
>                          Last   : out Natural);
> 
> 
> In this case the First parameter is changed to mode "in out", the
> initial value being the starting position to look for the given token.
> This last solution looks better to me.
> 
> Thoughts ?

I definitely like the first solution (separate From and First
parameters) better.  If the second solution were adopted, I think a
call to it would look confusing, since the parameter would have to be
a variable used for one meaning before the call and a different
(although vaguely similar) meaning after the call.  Anyway, I've seen
code that calls routines like that and I always end up scratching my
head trying to figure out what the heck is going on.

****************************************************************

From: Randy Brukardt
Sent: Friday, November 3, 2006  11:28 PM

> I just checked and Find_Token is not mentioned at all in AI-301
> (including all of the e-mail).  Looks to me like nobody else noticed
> it.

I'm not sure that anyone knows that Find_Token exists or what it does. So
it's not surprising that it didn't immediately come to mind. Anyway, I think
you could make the argument that the "From" parameter is useful for pretty
much all of the Unbounded string routines, but it is really easy for that to
turn into feeping creaturism. (It's hard to find much use for most of the
Unbounded string routines anyway.) So where do you draw the line?

I suspect that adding much more to AI-301 would have killed it (it was a
tough sell originally), so I think it was best that Find_Token was left out.
That doesn't mean that we shouldn't think about adding it in the future.

****************************************************************

From: Pascal Obry
Sent: Monday, November 6, 2006 12:43 AM

> I'm not sure that anyone knows that Find_Token exists or what it does. So
> it's not surprising that it didn't immediately come to mind. Anyway, I think
> you could make the argument that the "From" parameter is useful for pretty
> much all of the Unbounded string routines, but it is really easy for that to

Why all? Apart Index and Find_Token that can be use repeatedly to look
for some patterns in a string I don't see the need for others.

> turn into feeping creaturism. (It's hard to find much use for most of the
> Unbounded string routines anyway.) So where do you draw the line?

Hard to find much use ? Ok, I must be different then :) Frankly this is
quite a nice addition to Ada95, and there is services in
Ada.Strings.Unbounded that I used all the time! I definitely think that
improving it is very important, hence my Find_Token proposal. Better the
interface will be more it will be used!

The solution to my problem today is to convert the unbounded_string to a
string and to take successive slice to pass to Find_Token. This is not
acceptable for a language like Ada!

****************************************************************

From: Jeffrey Carter
Sent: Monday, November 6, 2006  2:22 PM

> The solution to my problem today is to convert the unbounded_string to a
> string and to take successive slice to pass to Find_Token. This is not
> acceptable for a language like Ada!

Why not use Ada.Strings.Unbounded.Slice?

****************************************************************

From: Pascal Obry
Sent: Monday, November 6, 2006  2:36 PM

Performance?

****************************************************************

From: Jeffrey Carter
Sent: Monday, November 6, 2006  8:18 PM

Then you probably shouldn't be using Ada.Strings.Unbounded.

****************************************************************

From: Pascal Obry
Sent: Tuesday, November 7, 2006  1:33 AM

That's not because unbounded strings are slower than standard strings
that I must be ok to use an even worst implementation of Find_Token.
Dealing with unbounded strings directly is ok, the conversion from/back
to string hit performance. I want to avoid that.

Note also that with a good cache, the unbounded strings are not that
slow. See the GNAT implementation for example.

And we are speaking of a very simple addition, looks worth it to me.

****************************************************************

From: Randy Brukardt
Sent: Tuesday, November 7, 2006  6:25 PM

> That's not because unbounded strings are slower than standard strings
> that I must be ok to use an even worst implementation of Find_Token.
> Dealing with unbounded strings directly is ok, the conversion from/back
> to string hit performance. I want to avoid that.

Then Jeff is right. To use the unbounded strings package requires lots of
conversions back and forth, simply because most of the operations in the
unbounded strings package take String, not Unbounded_String, arguments. For
instance, my spam filter does a lot of searching for patterns (stored as
lists of unbounded strings) in messages (stored as lists of unbounded
strings). The patterns have to be converted to strings on every use - ugh.
[Yes, I could have stored the patterns as regular strings, but then I'd have
to do a lot of memory management on the lists of patterns. And if I did
that, I would necessarily convert the messages (since they're stored in the
same type) to regular strings as well -- and I wouldn't use unbounded
strings at all.]

So if you need maximum performance, you can't use unbounded strings. If the
memory management aspects are more important to you than performance, then
the extra conversions cannot be a big deal. You can't have it both ways
(mainly because Ada doesn't have a way to give string literals to private
types -- but even if it did, you'd need a package quite different than
unbounded strings).

> Note also that with a good cache, the unbounded strings are not that
> slow. See the GNAT implementation for example.
>
> And we are speaking of a very simple addition, looks worth it to me.

But remember that any change to the standard packages is (potentially)
incompatible. We need a strong justification to introduce incompatibilies.
We took a somewhat weaker hurdle for incompatibilities in the Amendment,
because it represented a major update and we expected users to be
unsurprised about minor glitches from rare incompatibilies.

Note that we do *not* allow these new routines in Ada 95 implementations.
That's specifically because of the compatibility concerns -- we do not want
programs that work on one Ada 95 compiler to fail on another because of the
presence or absence of these new routines.

But the Amendment is done now, and it is in use (at least with GNAT).
Changes now have a higher burden. Of course, if there is an actual bug
(wrong mode, wrong type, etc.), that should be fixed, but we're not in the
business of making changes that might break real, existing programs simply
because it seems inconsistent and it is a "simple addition".

If this had comment had been made a year ago while the Amendment was still
being finalized, the change might very well have been made. But the
Amendment is frozen (and mostly approved) and in use. In my opinion,
nice-to-haves have to wait for the next revision/Amendment. Whenever that
is.

****************************************************************

From: Jeffrey Carter
Sent: Tuesday, November 7, 2006  8:11 PM

> Then Jeff is right. To use the unbounded strings package requires lots of
> conversions back and forth, simply because most of the operations in the
> unbounded strings package take String, not Unbounded_String, arguments. For
> instance, my spam filter does a lot of searching for patterns (stored as
> lists of unbounded strings) in messages (stored as lists of unbounded
> strings). The patterns have to be converted to strings on every use - ugh.
> [Yes, I could have stored the patterns as regular strings, but then I'd have
> to do a lot of memory management on the lists of patterns. And if I did
> that, I would necessarily convert the messages (since they're stored in the
> same type) to regular strings as well -- and I wouldn't use unbounded
> strings at all.]

What he said.

> If this had comment had been made a year ago while the Amendment was still
> being finalized, the change might very well have been made. But the
> Amendment is frozen (and mostly approved) and in use. In my opinion,
> nice-to-haves have to wait for the next revision/Amendment. Whenever that
> is.

My guess is 2019.

****************************************************************

From: John Barnes
Sent: Wednesday, November 8, 2006  1:27 AM

> So if you need maximum performance, you can't use unbounded strings. If the
> memory management aspects are more important to you than performance, then
> the extra conversions cannot be a big deal. You can't have it both ways
> (mainly because Ada doesn't have a way to give string literals to private
> types -- but even if it did, you'd need a package quite different than
> unbounded strings).

One of the features that Tuck proposed when doing Ada 9x was to allow the
definition of literals for private types. I thought it was a wonderful idea
and still miss it. But it was killed at an early stage.

A thought for Ada 2016?

****************************************************************

From: Christoph Grein
Sent: Wednesday, November 8, 2006  2:07 AM

Why not, but how would those literals be different from enums? We
already have a kind of such "literals" as parameterless functions
returning objects of the private type.

How could we define "string literals" (or aggregates) for private types?

What kind of literals are envisaged after all?

****************************************************************

From: Robert A. Duff
Sent: Wednesday, November 8, 2006  3:12 PM

> How could we define "string literals" (or aggregates) for private types?
> 
> What kind of literals are envisaged after all?

The idea is that the programmer provides a function that converts from the
source representation to the type, and this function is implicitly called when
a literal appears in the source code.  Perhaps:

    function My_Literal_Function (X : String) return My_Time_Type;
    for My_Time_Type'Literal use My_Literal_Function;

Then:

    X : My_Time_Type := "June 1, 2006, at 10 o'clock";

would be equivalent to:

    X : My_Time_Type := My_Literal_Function("June 1, 2006, at 10 o'clock");

Or:

    function Lit (X : String) return Bignum;
    for Bignum'Literal use Lit;

    X : Bignum := (2 ** 100) - 1_000_000_000_000_000_000_000_000_000_000;

One could do similar things for record aggregates and extension aggregates.
Array aggregates are tricky.

The overload resolution rules would have to be changed incompatibly.
Currently, in P(123), the 123 can be used to choose a P that takes
Integer over some non-integer type.  That call would have to be
ambiguous.

****************************************************************

From: Alexander E. Kopilovich
Sent: Wednesday, November 8, 2006  9:27 PM

> How could we define "string literals" (or aggregates) for private types?
> 
> What kind of literals are envisaged after all?

and Robert A. Duff replies:

>The idea is that the programmer provides a function that converts from the
>source representation to the type, and this function is implicitly called when
>a literal appears in the source code.  Perhaps:
>
>    function My_Literal_Function (X : String) return My_Time_Type;
>    for My_Time_Type'Literal use My_Literal_Function;

Yes, something of this kind I proposed here 3 years ago (and that proposition
received the honorary status "no action" on 03-12-05):

  http://www.ada-auth.org/cgi-bin/cvsweb.cgi/ACs/AC-00090.TXT?rev=1.2

****************************************************************

From: Pascal Obry
Sent: Wednesday, November 8, 2006  3:37 AM

Randy Brukardt a écrit :
> So if you need maximum performance, you can't use unbounded strings. If the
> memory management aspects are more important to you than performance, then
> the extra conversions cannot be a big deal. You can't have it both ways
> (mainly because Ada doesn't have a way to give string literals to private
> types -- but even if it did, you'd need a package quite different than
> unbounded strings).

Looks like I'm not making myself clear.

First of all I'm not searching maximum performance. I'm just trying to
avoid maximum performance degradation. That's quite different to me.

Secondly, I'd like also to point out that if the unbounded_string is
huge, converting to string might not be an option.

Last, I'm not pushing to have this in Ada 2005. I raised an issue and
everybody seems to be working hard to find arguments to dismiss it. Just
to be clear, I'm perfectly fine to have this issue dropped right now or
scheduled for the next amendment.

> But remember that any change to the standard packages is (potentially)
> incompatible. We need a strong justification to introduce incompatibilies.
> We took a somewhat weaker hurdle for incompatibilities in the Amendment,
> because it represented a major update and we expected users to be
> unsurprised about minor glitches from rare incompatibilies.

I understand, in the current case I don't see what kind of
incompatibilities could be introduced.

****************************************************************

From: Randy Brukardt
Sent: Wednesday, November 8, 2006  5:35 PM

...
> Last, I'm not pushing to have this in Ada 2005. I raised an issue and
> everybody seems to be working hard to find arguments to dismiss it. Just
> to be clear, I'm perfectly fine to have this issue dropped right now or
> scheduled for the next amendment.

Oh, OK. I naturally assumed that you were looking for a change sooner than
10 years from now, as we're not intentionally looking for new Amendment
ideas now. (Of course, they sometimes come up organically, as in the other
thread that's going on now. They'll get filed somewhere for future
reference.)

> > But remember that any change to the standard packages is (potentially)
> > incompatible. We need a strong justification to introduce incompatibilies.
> > We took a somewhat weaker hurdle for incompatibilities in the Amendment,
> > because it represented a major update and we expected users to be
> > unsurprised about minor glitches from rare incompatibilies.
>
> I understand, in the current case I don't see what kind of
> incompatibilities could be introduced.

Pretty much any change to a predefined package can cause problems if the
package is USEd. And it's pretty common to reference the predefined packages
with a use clause. The problem occurs if there is a user-defined routine
with the same name in some package that is used as well. In that case,
adding a new routine can make existing calls ambiguous. Worse, child
packages of Unbounded can have the behavior of their calls changed silently
(the new routine, rather than the user-defined one, would be called, as the
new one would be directly visible and that has priority over any
use-visibility).

Obviously, it's not particularly likely for there to be something called
Find_Token in user code; but my experience is that the names of predefined
routines often get "borrowed" for other purposes (they tend to be good,
simple names, and programmers are familar with them). And, as I said before,
it's not clear that we're willing to have any unnecessary incompatibilities
when we're purely in bug-fixing mode (as opposed to Amendment mode).

****************************************************************

From: Robert A. Duff
Sent: Wednesday, November 8, 2006  2:58 PM

> I understand, in the current case I don't see what kind of
> incompatibilities could be introduced.

Whenever a new subprogram is added to a package, it causes an incompatibility.
In particular, if another subprogram with the same name and profile exists in
some user's package, and both packages have use_clauses, then calls to the
user's subprogram become illegal, due to the name conflict.

But it's hardly a reason to say "never add a subprogram to a predefined
package"!

****************************************************************

From: Robert A. Duff
Sent: Wednesday, November 8, 2006  7:18 PM

> Obviously, it's not particularly likely for there to be something called
> Find_Token in user code; ...

Actually, that's not so obvious.  Pascal wants Find_Token-with-From.
If he doesn't get it from the ARG, I'd say it's quite likely that he
will declare it in his own package!  So if ARG adds it, it _will_
conflict.  Whether he will consider that a bug or a feature is an
interesting question.  ;-)

>...but my experience is that the names of predefined
> routines often get "borrowed" for other purposes...

Well, OK, but if it's "for other purposes", it has a different profile, and
therefore won't conflict.  (Presuming it's overloadable, as is the case for
subprograms.)

>... (they tend to be good,
> simple names, and programmers are familar with them). And, as I said before,
> it's not clear that we're willing to have any unnecessary incompatibilities
> when we're purely in bug-fixing mode (as opposed to Amendment mode).

It's a judgement call.  I have no strong opinion one way or 'tother here.
I don't think the possibility of name conflicts should absolutely rule
out additions to predefined packages.

****************************************************************

From: Randy Brukardt
Sent: Wednesday, November 8, 2006  7:38 PM

> It's a judgement call.  I have no strong opinion one way or 'tother here.
> I don't think the possibility of name conflicts should absolutely rule
> out additions to predefined packages.

Well, I'd agree personally, but the ARG has come down on the side of
compatibility in Ada 95 vs. Ada 2007 changes. As I'm sure you know, GNAT has
pragmas and switches to ensure that the new subprograms are not used by Ada
95 programs -- and that was discussed and required by the ARG. I don't see
how this case (or any other case not involving a clear bug) differs from
that decision - Ada 2007 (or Ada 2005 if you prefer) is frozen now and I
don't think we should be making random incompatible changes other than to
fix bugs.

****************************************************************

From: Pascal Leroy
Sent: Thursday, November 9, 2006  2:07 AM

> It's a judgement call.  I have no strong opinion one way or 
> 'tother here. I don't think the possibility of name conflicts 
> should absolutely rule out additions to predefined packages.

I am not too concerned about name conflicts (I believe that they are
extremely improbable) but I am concerned about portability.  If we add new
subprograms now, it is not clear if/when they will be incorporated in
compilers.  So programs that use the new and improved Find_Token may not
port.  Not a good thing.

On the other hand, there aren't many compiler technologies left...

****************************************************************

From: Pascal Obry
Sent: Thursday, November 9, 2006  2:56 PM

On the other hand we are talking about a trivial implementation, 5
minutes for the implementation, 15 minutes to add a non regression test!
So I don't see a portability problem here, at least vendors won't have
hard work supporting this.

****************************************************************

From: Randy Brukardt
Sent: Thursday, November 9, 2006  6:54 PM

I'd argue with your numbers (they're several orders of magnitude low), but
they're irrelevant in any case (as we've discussed in the ARG several
times). Vendors don't release new versions of compilers for every 10 minute
change that comes from the ARG! Depending on the vendor, compiler releases
require a lot of QA testing, documentation work, and the like. Often,
release cycles are over a year or more long. Moreover, some vendors (and
most users) only use completed ISO standards for their work (ignoring ARG
rulings in between). Even if this change was adopted at the upcoming ARG
meeting, it would not appear in a published standard for several more years.

So (if adopted now) there would be a period (probably a long period) where
some implementations implemented the change and some did not. This would
cause a portability issue, as Pascal Leroy pointed out. Moreover, it would
mean that cautious users could not use the new routine (and most likely,
many of them would not even know it exists, since it would not appear in the
Standard). This is precisely the situation that the ARG voted to not allow
to happen with Ada 95 compilers vis-a-vis the new Index functions. I don't
see why Ada 2005 compilers should be any different. (Indeed, I would be very
upset if we were to go ahead with this subprogram, but continue to not allow
a similar incompatibility in Ada 95 compilers. The effect of that is to
require a significant amount of work to allow routines in the runtime to be
accessed or invisible depending on a compiler switch -- a *lot* more work
than "5 minutes for the implementation".)

****************************************************************

From: Dan Eilers
Sent: Thursday, November 9, 2006  7:20 PM

> So (if adopted now) there would be a period (probably a long period) where
> some implementations implemented the change and some did not. This would
> cause a portability issue, as Pascal Leroy pointed out.  ...

This portability concern would seem to apply to just about any
non-editorial AI ever considered by the ARG.  Are you suggesting
that the ARG should stop considering non-editorial AI's just because
implementers may implement them at different times?  or is this
particular issue somehow special?

****************************************************************

From: Randy Brukardt
Sent: Thursday, November 9, 2006  7:46 PM

No, of course not. Certainly, the concern doesn't apply to Amendment-class
AIs (because they won't be implemented now, and when they are implemented it
will be as part of a new version of the language). It does apply to all
other AIs. But, most AIs are upwards compatible (while additions/changes to
the standard library are not). For instance, adding the missing wording
that Adam pointed out is a compatible change (it's unlikely that anyone
would have intentionally implemented anything other than the rules for
instantiation, especially as the rules were correct in Ada 95). Those that
aren't fix significant bugs in the Standard or omissions where it is not
clear what an implementer should do. (In the later case, the AI actually
increases compatibility in the long run.)

If there are AIs that don't fit in any of these categories, and they cause
incompatibilities, then they probably should not be adopted (or should be
reclassified as Amendment AIs).

****************************************************************

From: Pascal Leroy
Sent: Friday, November 10, 2006  1:42 AM

> Vendors don't release 
> new versions of compilers for every 10 minute change that 
> comes from the ARG! Depending on the vendor, compiler 
> releases require a lot of QA testing, documentation work, and 
> the like. Often, release cycles are over a year or more long. 

Not to mention that, once a release is out, users don't rush to adopt it.
We still have users happily using a version that we released in 2000, and
they won't move to more recent stuff for fear of destabilizing their
environment.  These big projects have a huge inertia.

****************************************************************

Questions? Ask the ACAA Technical Agent