Version 1.5 of ais/ai-00238.txt

Unformatted version of ais/ai-00238.txt version 1.5
Other versions for file ais/ai-00238.txt

!standard A.04.04 (101)          02-05-09 AI95-00238/03
!class binding interpretation 01-10-07
!status Amendment 200Y 02-07-09
!status WG9 Approved 02-06-21
!status ARG Approved 7-0-1 01-10-07
!status work item 00-07-10
!status received 00-07-10
!qualifier Clarification
!priority Low
!difficulty Medium
!subject What is the lower bound of Ada.Strings.Bounded.Slice?
!summary
The bounds of the string returned from Ada.Strings.Bounded.Slice are the values passed to the function as the parameters Low and High.
!question
What is the lower bound of Ada.Strings.Bounded.Slice? (Low.) The standard does not clearly say.
Here is the text of the paragraphs in question (including text added by Technical Corrigendum 1):
function Slice (Source : in Bounded_String; Low : in Positive; High : in Natural) return String;
Returns the slice at positions Low through High in the string represented by Source; propagates Index_Error if Low > Length(Source)+1 or High > Length(Source).
This really doesn't say what the bounds of the result are. If "slice" is taken literally, then the bounds are Low .. High. But that is different from most other string subprograms returning a string, which specifically state that the lower bound is 1. Moreover, my reading of the paragraph implies that slice was meant to informally describe what is returned, not to set bounds, exceptions, and so on.
!recommendation
(See summary.)
!wording
(See corrigendum.)
!discussion
A.4.4(1) describes the string represented by Source as a string with lower bound 1 and the upper bound determined by the current length. Taking a slice of this string from Low to High clearly gives a result with bounds Low and High.
Requiring this function to return a string with lower bound 1 is appealing, as it is likely to prevent bugs (from assuming that 'Last = 'Length), and is more consistent with the other functions which return strings in Ada.Strings. However, the existing wording of the paragraph does not support this interpretation.
A survey of existing implementations shows that most return a string with lower bound equal to Low for this function.
Based on the above evidence, we conclude that the standard should have explicitly said that the bounds are Low and High.
!corrigendum A.04.04(101/1)
Replace the paragraph:
Returns the slice at positions Low through High in the string represented by Source; propagates Index_Error if Low > Length(Source)+1 or High > Length(Source).
by:
Returns the slice at positions Low through High in the string represented by Source; propagates Index_Error if Low > Length(Source)+1 or High > Length(Source). The bounds of the returned string are Low and High.
!ACATS test
Add a test case to CXA4019, CXA4034, and similar tests for Unbounded strings, to check the bounds of Slice. (Unbounded says that the rule is the same as Bounded.)
!appendix

From: Randy Brukardt
Sent: Thursday, July 06, 2000 9:29 PM

While working through the Defect Reports, we noticed that paragraph
A.4.4(101) is not changed by AI-128. This leave the question of what the
lower bound of its returned string actually is. In a short discussion, the
ARG members at the meeting were unable to decide, and we ignored the issue
for now. We also made a note to have me bring this up on the ARG mailing
list for resolution (or to open an AI).

Here is the text of the paragraphs in question:

100 function Slice (Source : in Bounded_String;
                    Low    : in Positive;
                    High   : in Natural)
       return String;

        101   Returns the slice at positions Low through High in the string
        represented by Source; propagates Index_Error if Low >
        Length(Source)+1 {or High > Length(Source)}.

(The text in braces is added by the corrigendum to fix another bug.}

This really doesn't say what the bounds of the result are. If "slice" is
taken literally, then the bounds are Low .. High. But that is different than
most other string subprograms returning a string, which specifically state
that the low bound is 1. Moreover, my reading of the paragraph implies that
slice was meant to informally describe what is returned, not to set bounds,
exceptions, and so on.

So, what are the bounds of the result of this function?

****************************************************************

From: Robert Dewar
Sent: Thursday, July 06, 2000 9:52 PM

the bound should surely be 1 here, regardless of the wording in the RM.
Indeed it seems clear that the RM does intend a bound of 1 to me.

****************************************************************

From: Jon S. Squire
Sent: Friday, July 07, 2000 3:21 AM

>So, what are the bounds of the result of this function?

 type String is array(positive range <>) of Character;

Same as any numeric array.
If the author of  "slice" wanted to be nasty, legal return subscripts
could be  positive'last-(high-low).. positive'last

You may want to be consistent with all array types?
Just do not specify, else you open a can of worms elsewhere.

****************************************************************

From: Robert A Duff
Sent: Friday, July 07, 2000 9:03 AM

> So, what are the bounds of the result of this function?

The lower bound should be 1, I think.

****************************************************************

From: Erhard Ploedereder [ploedere@INFORMATIK.UNI-STUTTGART.DE]
Sent: Friday, July 07, 2000 11:26 AM

I certainly don't see an "of course" reason to make a lower bound = 1 ruling
desirable from a user's point of view and certainly not from an
implementer's point of view, whose code in "Slice" is likely to say
something like "return Source.contents(Low, High)" and that one undoubtedly
returns Low and High bounds, not 1 ... High-Low+1 bounds.

This actually is a "damned-if-you-do-and-damned-if-you-don't" question.

If you see the call "Slice(A,2,10)" as the functional analogy to "A(2,10)",
which you would love to write but can't, then you would clearly expect
a slice with bounds 2 and 10.

If you see "Slice(A,2,10)" as a general string interface akin to the ones
for fixed-length strings, then a bound of 1 should apply for a vague
uniformity's sake, since the fixed-length interfaces have the lower bound=1 rule.
(It's the only interface for Bounded_Strings that returns a String, so
one can't use a direct uniformity argument with any other interface within
the package itself.)

The analogy to the Ada.Strings.Fixed interfaces does not really apply, since
in all these interfaces the string contents are actually modified by
content-sensitive rules and such a modified, returned string simply is not a
slice, but "a new string", whose bounds needs to be specified.

I happen to believe that the words in the standard presently support only
the first interpretation, not the second. Read A.4.4(1) -- "a bounded string
represents a String with lower bound 1" -- A.4.4(101) -- "Slice returns a
slice at positions Low through High in the string respresented by (the
bounded string) Source" -- and the definition of slice in 4.1.2.


So, a lower-bound=1 ruling for Slice would certainly require a wording
change to override the semantics implied by the word "slice".

My advice is to do such a fix, if at all, next time around.

IMHO, there is no reason for a change, not even a clarification (which
would simply say: "Yes, a slice is a slice is a slice. Whatever gave you
a different idea ?")

****************************************************************

From: Robert Dewar [dewar@GNAT.COM]
Sent: Friday, July 07, 2000 11:43 AM

The weakness in Erhard's argument is that the semantics of "slice" which
talk about bounds apply ONLY to arrays. There is no array in sight here,
at least not in the input. Let's look:

100 function Slice (Source : in Bounded_String;
                    Low    : in Positive;
                    High   : in Natural)
       return String;

I don't see any array in the input, only a bounded string. Yes, it is quite
likely (but certainly not required) that the bounded string be represented
using an array. Yes, it is quite likely that the lower bound of this
implementation array is 1 (but this is not required either). Yes, it is
quite likely that a natural implementation is to return the slice. Indeed,
in the case of GNAT, there is an extra copy precisely to ensure that the
bounds are 1 .. N, but all this is implementation level stuff.

At the semantic level, there really is no array.

On the other hand, if we look at the exact wording:

Rerturns the slice .. in the string represented by ....

sure sounds like Slice (X,1,10) should be equiavlent to
To_String(X)(1..10).

I wonder why GNAT takes the trouble to return 1 here. Probably because this
is what happens in the Unbounded string case, and I am almost certainly there
was discussion of the bound of 1 here, or an ACVC test that requires a bound
of 1. I can't believe that we are doing the extra copy for amusement :-)

****************************************************************

From: Robert Dewar [dewar@GNAT.COM]
Sent: Friday, July 07, 2000 11:47 AM

I think the thing to do here is to follow the suggestion of finding out
what current implementations do. If all return 1 as the lowre bound, then
clearly that should be the resolution of this issue. If they differ,
we have a tougher case to handle.

P.S. I think Ada made a bad mistake in making slices of strings have the
bounds of the slice, It is MUCH better to normalize all slices with a standard
lower bound as is done in Algol-68. In Algol-68, the notation

  a[b:c]

always returns a lower bound of 1. If you want a different bound on the
result, you say so

  a[b:c @ 3]

A much better design. Why? Because it is SUCH a common bug for routines
taking string arguments to assume the lower bound is 1, and then malfunction
when used with slices. Actually avoiding such bugs leads to a lot of extra
ineficiency and obfuscatory code in routines handling strings, so it is not
surprising it often gets omitted.

There really should be a way of insisting on the lower bound of unconstrained
arrays.

****************************************************************

From: Mike Kamrad
Sent: Friday, July 07, 2000 12:23 PM

Hmmm...that sure has the sound of a amendment to me

****************************************************************

From: Robert A Duff
Sent: Friday, July 07, 2000 12:06 PM

The AverStar implementations return the bounds as given.
GNAT returns the bounds slid to 1..whatever (which is probably
a deliberate decision, since it requires extra code).

I don't like that sort of non-uniformity.

But I still think "with type" is more important.  ;-) ;-)

Anyway, I'd be willing to flip a coin to decide which way we should go
on this, but I do think we should go one way or the other.

****************************************************************

From: Robert A Duff
Sent: Friday, July 07, 2000 12:15 PM

> I think the thing to do here is to follow the suggestion of finding out
> what current implementations do. If all return 1 as the lowre bound, then
> clearly that should be the resolution of this issue.

That was my thought, too.

>... If they differ,
> we have a tougher case to handle.

As you can see from my previous message, the first two implementations I
looked at differ.  Too bad.

> P.S. I think Ada made a bad mistake in making slices of strings have the
> bounds of the slice, It is MUCH better to normalize all slices with a standard
> lower bound as is done in Algol-68.

I very much agree.  Oh, well.

****************************************************************

From: Randy Brukardt
Sent: Friday, July 07, 2000 12:50 PM

> >... If they differ, we have a tougher case to handle.
>
> As you can see from my previous message, the first two
> implementations I looked at differ.  Too bad.

OK, it seems obvious we need an AI to decide this (there is no clear
answer). OTOH, it is a low priority AI.

I'll have to check what other implementations do (Janus/Ada appears to
return with a lower bound Low, based on inspection of the code).

BTW, I have adding test cases (to existing tests) to check that the bounds
of the various operations in A.4.3 have a lower bound of 1 as "minimal
value". Perhaps that is an incorrect judgement?

****************************************************************

From: Robert Dewar
Sent: Friday, July 07, 2000 1:41 PM

<<Anyway, I'd be willing to flip a coin to decide which way we should go
on this, but I do think we should go one way or the other.
>>

I agree.

****************************************************************

From: Robert Dewar
Sent: Friday, July 07, 2000 3:16 PM

No I think it is not minimal value, since bounds that are not 1 can be the
source of many bugs.

****************************************************************

From: Pascal Leroy
Sent: Friday, July 07, 2000 3:19 AM

> This really doesn't say what the bounds of the result are. If "slice" is
> taken literally, then the bounds are Low .. High. But that is different
> than most other string subprograms returning a string, which specifically state
> that the low bound is 1.

Well, there aren't many functions returning String in Ada.Strings.Bounded
(the only one I can find is To_String).  Moreover, note that A.4.4 doesn't
have a paragraph equivalent to A.4.3(2).

> Moreover, my reading of the paragraph implies that
> slice was meant to informally describe what is returned, not to set
> bounds, exceptions, and so on.
>
> So, what are the bounds of the result of this function?

We should do the "least surprising" thing.  On the one hand returning 1
might be more natural (!) because people except strings to have a lower
bound of 1, and generally don't use attribute 'First.  On the other hand, it
would be good if an expression like:

    Slice (S, Lo, Hi) (Lo)

didn't raise Constraint_Error because some users might expect a slice-like
behavior.

I am not sure which way to go...

****************************************************************

From: Robert Dewar
Sent: Friday, July 07, 2000 6:13 PM

Which way does rational do things now, that's useful information!

****************************************************************

From: Erhard Ploedereder
Sent: Saturday, July 08, 2000 10:56 AM

> BTW, I have adding test cases (to existing tests) to check that the bounds
> of the various operations in A.4.3 have a lower bound of 1 as "minimal
> value". Perhaps that is an incorrect judgement?

No, you were right for A.4.3. A.4.3(2) makes that requirement crystal-clear
as long as you're talking about function results.

****************************************************************

From: Erhard Ploedereder
Sent: Saturday, July 08, 2000 11:12 AM

> As you can see from my previous message, the first two
> implementations I looked at differ.  Too bad.

That's precisely what I was afraid of and why I would prefer not
to have a ruling in this TC, because it might turn out to be
contentious, especially now that we have aired the issue extensively :-)

Clearly, there should be agreement on it and, if the agreement goes
for bound=1, then we write up an AI, which goes in the next TC.

****************************************************************

From: Robert Dewar
Sent: Saturday, July 08, 2000 11:19 AM

An AI is needed in either case if you ask me, and as for the TC, the
less in it the better as far as I am concerned :-)

****************************************************************

From: Robert Dewar
Sent: Saturday, July 08, 2000 11:40 AM

The arguments in favor of resolving this are as follows:

In favor of changing to 1

  This is more robust from the point of view of preventing bugs, also it
  is much more likely that changing from 1 to slice semantics will upset
  existing programs than the other way round.

In favor of changing to slice semantics

  Arguably closer to the existing language in the RM, and more efficient
  in typical implementations.

I am not clear on the best choice I must say, both these arguments are good
and they are hard to balance because they are apples and oranges.

****************************************************************

From: Jean-Pierre Rosen
Sent: Sunday, July 09, 2000 8:24 AM

FWIW:

On the uniformity issue: I had a look at Unbounded_String, but it says
(A.4.5(82) that:
   "... Slice subprograms have the same effect as the corresponding bounded-length
    string handling"
Not very helpful.

I'd be slightly inclined to the "slice" semantics, on the argument that Slice
(X, Lo, Hi)(Lo) should not raise C_E. I don't buy the argument that lower bound
1 would raise less bugs for people who do not properly use 'FIRST; any Ada
programmer should jolly well be taught to care about the case where the lower
bound is not 1, otherwise surprises are to be expected in so many cases that
this one would appear extremely rare in comparison!

BTW, A.4.5(78) says that To_String (To_Unbounded_String(S)) = S. This is correct
since sliding occurs in the "=" operator, but is slightly misleading if you read
it as meaning that the double conversion is a no_op (the bounds are changed).
Interestingly enough, the same remark does not apply for the opposite double
conversion (A.4.5(79)).

****************************************************************

From: Robert Dewar
Sent: Sunday, July 09, 2000 8:43 AM

<<I'd be slightly inclined to the "slice" semantics, on the argument that Slice
(X , Lo, Hi)(Lo) should not raise C_E. I don't buy the argument that lower bound
1 would raise less bugs for people who do not properly use 'FIRST; any Ada
programmer should jolly well be taught to care about the case where the lower
bound is not 1, otherwise surprise s are to be expected in so many cases that
this one would appear extremely rare in comparison! >>

But Slice (X, Lo, Hi)(Lo)  is such a silly construction that it seems
unsupportable to argue from it except on narrow legal grounds.

As for dismissing the second argument on the basis of what Ada programmers
should jolly well be taught, that may seem reasonable to an educator, but
in the real world, what people "jolly well [were] taught" does not necessarily
dictate what is seen in practice, which is that the failure of string programs
to handle the case of a non-1 lower bound is a commong error. I don't find
that surprising, since really it is a bad design point in the language that
strings ever have a lower bound other than 1.

****************************************************************

From: Jean-Pierre Rosen
Sent: Sunday, July 09, 2000 9:29 AM

I don't argue with facts (i.e. that there are still many people assuming that
the low bound is 1). I'm just saying that the issue happens so often that this
particular case would be insignificant compared with the number of other places
where people have to care about it. Whether it was a good idea in the first
place is another issue - we have to live with it.

****************************************************************

From: Robert Dewar
Sent: Sunday, July 09, 2000 9:41 PM

Neverthless, it is likely that changing from a lower bound of 1 to slice
semanitcs will indeed cause bugs in existing code.

****************************************************************

From: Robert Dewar
Sent: Sunday, July 09, 2000 9:04 AM

The big piece of missing information here is what other compilers do. We
know the answer for GNAT and Rational. But what about the other current
compiler technologies. Surely someone can provide some more data.

I really think this is the sort of issue where current practice is
significant. If most compilers are one way rather than the other, that
has some influence.

****************************************************************

From: Pascal Leroy
Sent: Monday, July 10, 2000 3:30 AM

> which way does rational do things now, that's useful information!

Rational return a lower bound of Low.  I understand that Averstar and Janus
do the same thing, so GNAT appears to be the odd man out.

****************************************************************

From: Robert Dewar
Sent: Monday, July 10, 2000 7:28 AM

Janus is not exactly a critical entry here, since it is not a validated
Ada 95 compiler. But we are missing input from Aonix, DDC-I, Irvine, OCS
at least.

Yes, most likely Aonix yields low, but they are not using the latest
Averstar technology as I understand things, so we should double check
this.

****************************************************************

From: Robert Dewar
Sent: Monday, July 10, 2000 7:58 AM

I investigated the background a bit on why GNAT returns a lower bound of
1 rather than low. Interestingly this was a fairly recent change (May 1997).
It seems to have been done as part of a uniform fix to ensure that lower bounds
of 1 were returned, rather than any specific problem. The justification is
noted as "change lower bound to 1 to conform with the RM", so clearly I
read the RM at the time as requiring a lower bound of 1, but I really don't
see what lead me clearly to that conclusion at this stage. So I must say
I am inclined to just "fix" this in GNAT. But it would be nice to get a
full set of reports from all vendors on this one.

I will also investigate a bit further the historical record at the time
of the GNAT change in this area to see if anything more might have
motivated it.

****************************************************************

From: Joyce L. Tokar
Sent: Monday, July 10, 2000 7:08 AM

Our (DDC-I's) implementation returns a string with the bounds low..high.

****************************************************************

From: Robert Dewar
Sent: Monday, July 10, 2000 10:24 PM

Well I just "fixed" GNAT to return Low..High, since this seems to be
the commonest choice, and is indeed the more natural reading of the RM.

****************************************************************

From: Joyce L. Tokar
Sent: Monday, July 10, 2000 8:26 PM

I checked with Oliver about the behavior of their Ada 95 system on this issue --
OC's response is as follows

In the OC Systems' Ada95 implementation, Ada.Strings.Bounded.Slice returns a
slice whose bounds are Low..High, not 1..Length(Source).

****************************************************************



Questions? Ask the ACAA Technical Agent