!standard D (02) 00-07-10 AI95-00238/00 !class confirmation 00-07-10 !status work item 00-07-10 !status received 00-07-10 !qualifier Clarification !priority Low !difficulty Medium !subject What is the lower bound of Ada.Strings.Bounded.Slice? !summary The bounds of the string returned from Ada.Strings.Bounded.Slice are the values passed to the routine as Low and High. !question What is the lower bound of Ada.Strings.Bounded.Slice? (Low.) The standard does not clearly say. Here is the text of the paragraphs in question (including text added by Technical Corrigendum 1): function Slice (Source : in Bounded_String; Low : in Positive; High : in Natural) return String; Returns the slice at positions Low through High in the string represented by Source; propagates Index_Error if Low > Length(Source)+1 or High > Length(Source). This really doesn't say what the bounds of the result are. If "slice" is taken literally, then the bounds are Low .. High. But that is different than most other string subprograms returning a string, which specifically state that the low bound is 1. Moreover, my reading of the paragraph implies that slice was meant to informally describe what is returned, not to set bounds, exceptions, and so on. !response A.4.4(1) describes the string represented by Source as a string with low bound 1 and the upper bound determined by the current length. Taking a slice of this string from Low to High clearly gives a result with bounds Low and High. Requiring this function to return a string with lower bound 1 is appealing, as it is likely to prevent bugs (from assuming that 'Last = 'Length), and is more consistent with the other functions which return strings in Ada.Strings. However, the existing wording of the paragraph does not support this interpretation. A survey of existing implementations shows that most return a string with lower bound equal to Low for this function. Therefore, we confirm that the standard intends the bounds to be Low and High. !ACATS test Add a test case to CXA4019, whatever test is constructed for AI-128, and similar tests for Unbounded strings, to check the bounds of Slice. (Unbounded says that the rule is the same as Bounded.) !appendix From: Randy Brukardt Sent: Thursday, July 06, 2000 9:29 PM While working through the Defect Reports, we noticed that paragraph A.4.4(101) is not changed by AI-128. This leave the question of what the lower bound of its returned string actually is. In a short discussion, the ARG members at the meeting were unable to decide, and we ignored the issue for now. We also made a note to have me bring this up on the ARG mailing list for resolution (or to open an AI). Here is the text of the paragraphs in question: 100 function Slice (Source : in Bounded_String; Low : in Positive; High : in Natural) return String; 101 Returns the slice at positions Low through High in the string represented by Source; propagates Index_Error if Low > Length(Source)+1 {or High > Length(Source)}. (The text in braces is added by the corrigendum to fix another bug.} This really doesn't say what the bounds of the result are. If "slice" is taken literally, then the bounds are Low .. High. But that is different than most other string subprograms returning a string, which specifically state that the low bound is 1. Moreover, my reading of the paragraph implies that slice was meant to informally describe what is returned, not to set bounds, exceptions, and so on. So, what are the bounds of the result of this function? **************************************************************** From: Robert Dewar Sent: Thursday, July 06, 2000 9:52 PM the bound should surely be 1 here, regardless of the wording in the RM. Indeed it seems clear that the RM does intend a bound of 1 to me. **************************************************************** From: Jon S. Squire Sent: Friday, July 07, 2000 3:21 AM >So, what are the bounds of the result of this function? type String is array(positive range <>) of Character; Same as any numeric array. If the author of "slice" wanted to be nasty, legal return subscripts could be positive'last-(high-low).. positive'last You may want to be consistent with all array types? Just do not specify, else you open a can of worms elsewhere. **************************************************************** From: Robert A Duff Sent: Friday, July 07, 2000 9:03 AM > So, what are the bounds of the result of this function? The lower bound should be 1, I think. **************************************************************** From: Erhard Ploedereder [ploedere@INFORMATIK.UNI-STUTTGART.DE] Sent: Friday, July 07, 2000 11:26 AM I certainly don't see an "of course" reason to make a lower bound = 1 ruling desirable from a user's point of view and certainly not from an implementer's point of view, whose code in "Slice" is likely to say something like "return Source.contents(Low, High)" and that one undoubtedly returns Low and High bounds, not 1 ... High-Low+1 bounds. This actually is a "damned-if-you-do-and-damned-if-you-don't" question. If you see the call "Slice(A,2,10)" as the functional analogy to "A(2,10)", which you would love to write but can't, then you would clearly expect a slice with bounds 2 and 10. If you see "Slice(A,2,10)" as a general string interface akin to the ones for fixed-length strings, then a bound of 1 should apply for a vague uniformity's sake, since the fixed-length interfaces have the lower bound=1 rule. (It's the only interface for Bounded_Strings that returns a String, so one can't use a direct uniformity argument with any other interface within the package itself.) The analogy to the Ada.Strings.Fixed interfaces does not really apply, since in all these interfaces the string contents are actually modified by content-sensitive rules and such a modified, returned string simply is not a slice, but "a new string", whose bounds needs to be specified. I happen to believe that the words in the standard presently support only the first interpretation, not the second. Read A.4.4(1) -- "a bounded string represents a String with lower bound 1" -- A.4.4(101) -- "Slice returns a slice at positions Low through High in the string respresented by (the bounded string) Source" -- and the definition of slice in 4.1.2. So, a lower-bound=1 ruling for Slice would certainly require a wording change to override the semantics implied by the word "slice". My advice is to do such a fix, if at all, next time around. IMHO, there is no reason for a change, not even a clarification (which would simply say: "Yes, a slice is a slice is a slice. Whatever gave you a different idea ?") **************************************************************** From: Robert Dewar [dewar@GNAT.COM] Sent: Friday, July 07, 2000 11:43 AM The weakness in Erhard's argument is that the semantics of "slice" which talk about bounds apply ONLY to arrays. There is no array in sight here, at least not in the input. Let's look: 100 function Slice (Source : in Bounded_String; Low : in Positive; High : in Natural) return String; I don't see any array in the input, only a bounded string. Yes, it is quite likely (but certainly not required) that the bounded string be represented using an array. Yes, it is quite likely that the lower bound of this implementation array is 1 (but this is not required either). Yes, it is quite likely that a natural implementation is to return the slice. Indeed, in the case of GNAT, there is an extra copy precisely to ensure that the bounds are 1 .. N, but all this is implementation level stuff. At the semantic level, there really is no array. On the other hand, if we look at the exact wording: Rerturns the slice .. in the string represented by .... sure sounds like Slice (X,1,10) should be equiavlent to To_String(X)(1..10). I wonder why GNAT takes the trouble to return 1 here. Probably because this is what happens in the Unbounded string case, and I am almost certainly there was discussion of the bound of 1 here, or an ACVC test that requires a bound of 1. I can't believe that we are doing the extra copy for amusement :-) **************************************************************** From: Robert Dewar [dewar@GNAT.COM] Sent: Friday, July 07, 2000 11:47 AM I think the thing to do here is to follow the suggestion of finding out what current implementations do. If all return 1 as the lowre bound, then clearly that should be the resolution of this issue. If they differ, we have a tougher case to handle. P.S. I think Ada made a bad mistake in making slices of strings have the bounds of the slice, It is MUCH better to normalize all slices with a standard lower bound as is done in Algol-68. In Algol-68, the notation a[b:c] always returns a lower bound of 1. If you want a different bound on the result, you say so a[b:c @ 3] A much better design. Why? Because it is SUCH a common bug for routines taking string arguments to assume the lower bound is 1, and then malfunction when used with slices. Actually avoiding such bugs leads to a lot of extra ineficiency and obfuscatory code in routines handling strings, so it is not surprising it often gets omitted. There really should be a way of insisting on the lower bound of unconstrained arrays. **************************************************************** From: Mike Kamrad Sent: Friday, July 07, 2000 12:23 PM Hmmm...that sure has the sound of a amendment to me **************************************************************** From: Robert A Duff Sent: Friday, July 07, 2000 12:06 PM The AverStar implementations return the bounds as given. GNAT returns the bounds slid to 1..whatever (which is probably a deliberate decision, since it requires extra code). I don't like that sort of non-uniformity. But I still think "with type" is more important. ;-) ;-) Anyway, I'd be willing to flip a coin to decide which way we should go on this, but I do think we should go one way or the other. **************************************************************** From: Robert A Duff Sent: Friday, July 07, 2000 12:15 PM > I think the thing to do here is to follow the suggestion of finding out > what current implementations do. If all return 1 as the lowre bound, then > clearly that should be the resolution of this issue. That was my thought, too. >... If they differ, > we have a tougher case to handle. As you can see from my previous message, the first two implementations I looked at differ. Too bad. > P.S. I think Ada made a bad mistake in making slices of strings have the > bounds of the slice, It is MUCH better to normalize all slices with a standard > lower bound as is done in Algol-68. I very much agree. Oh, well. **************************************************************** From: Randy Brukardt Sent: Friday, July 07, 2000 12:50 PM > >... If they differ, we have a tougher case to handle. > > As you can see from my previous message, the first two > implementations I looked at differ. Too bad. OK, it seems obvious we need an AI to decide this (there is no clear answer). OTOH, it is a low priority AI. I'll have to check what other implementations do (Janus/Ada appears to return with a lower bound Low, based on inspection of the code). BTW, I have adding test cases (to existing tests) to check that the bounds of the various operations in A.4.3 have a lower bound of 1 as "minimal value". Perhaps that is an incorrect judgement? **************************************************************** From: Robert Dewar Sent: Friday, July 07, 2000 1:41 PM <> I agree. **************************************************************** From: Robert Dewar Sent: Friday, July 07, 2000 3:16 PM No I think it is not minimal value, since bounds that are not 1 can be the source of many bugs. **************************************************************** From: Pascal Leroy Sent: Friday, July 07, 2000 3:19 AM > This really doesn't say what the bounds of the result are. If "slice" is > taken literally, then the bounds are Low .. High. But that is different > than most other string subprograms returning a string, which specifically state > that the low bound is 1. Well, there aren't many functions returning String in Ada.Strings.Bounded (the only one I can find is To_String). Moreover, note that A.4.4 doesn't have a paragraph equivalent to A.4.3(2). > Moreover, my reading of the paragraph implies that > slice was meant to informally describe what is returned, not to set > bounds, exceptions, and so on. > > So, what are the bounds of the result of this function? We should do the "least surprising" thing. On the one hand returning 1 might be more natural (!) because people except strings to have a lower bound of 1, and generally don't use attribute 'First. On the other hand, it would be good if an expression like: Slice (S, Lo, Hi) (Lo) didn't raise Constraint_Error because some users might expect a slice-like behavior. I am not sure which way to go... **************************************************************** From: Robert Dewar Sent: Friday, July 07, 2000 6:13 PM Which way does rational do things now, that's useful information! **************************************************************** From: Erhard Ploedereder Sent: Saturday, July 08, 2000 10:56 AM > BTW, I have adding test cases (to existing tests) to check that the bounds > of the various operations in A.4.3 have a lower bound of 1 as "minimal > value". Perhaps that is an incorrect judgement? No, you were right for A.4.3. A.4.3(2) makes that requirement crystal-clear as long as you're talking about function results. **************************************************************** From: Erhard Ploedereder Sent: Saturday, July 08, 2000 11:12 AM > As you can see from my previous message, the first two > implementations I looked at differ. Too bad. That's precisely what I was afraid of and why I would prefer not to have a ruling in this TC, because it might turn out to be contentious, especially now that we have aired the issue extensively :-) Clearly, there should be agreement on it and, if the agreement goes for bound=1, then we write up an AI, which goes in the next TC. **************************************************************** From: Robert Dewar Sent: Saturday, July 08, 2000 11:19 AM An AI is needed in either case if you ask me, and as for the TC, the less in it the better as far as I am concerned :-) **************************************************************** From: Robert Dewar Sent: Saturday, July 08, 2000 11:40 AM The arguments in favor of resolving this are as follows: In favor of changing to 1 This is more robust from the point of view of preventing bugs, also it is much more likely that changing from 1 to slice semantics will upset existing programs than the other way round. In favor of changing to slice semantics Arguably closer to the existing language in the RM, and more efficient in typical implementations. I am not clear on the best choice I must say, both these arguments are good and they are hard to balance because they are apples and oranges. **************************************************************** From: Jean-Pierre Rosen Sent: Sunday, July 09, 2000 8:24 AM FWIW: On the uniformity issue: I had a look at Unbounded_String, but it says (A.4.5(82) that: "... Slice subprograms have the same effect as the corresponding bounded-length string handling" Not very helpful. I'd be slightly inclined to the "slice" semantics, on the argument that Slice (X, Lo, Hi)(Lo) should not raise C_E. I don't buy the argument that lower bound 1 would raise less bugs for people who do not properly use 'FIRST; any Ada programmer should jolly well be taught to care about the case where the lower bound is not 1, otherwise surprises are to be expected in so many cases that this one would appear extremely rare in comparison! BTW, A.4.5(78) says that To_String (To_Unbounded_String(S)) = S. This is correct since sliding occurs in the "=" operator, but is slightly misleading if you read it as meaning that the double conversion is a no_op (the bounds are changed). Interestingly enough, the same remark does not apply for the opposite double conversion (A.4.5(79)). **************************************************************** From: Robert Dewar Sent: Sunday, July 09, 2000 8:43 AM <> But Slice (X, Lo, Hi)(Lo) is such a silly construction that it seems unsupportable to argue from it except on narrow legal grounds. As for dismissing the second argument on the basis of what Ada programmers should jolly well be taught, that may seem reasonable to an educator, but in the real world, what people "jolly well [were] taught" does not necessarily dictate what is seen in practice, which is that the failure of string programs to handle the case of a non-1 lower bound is a commong error. I don't find that surprising, since really it is a bad design point in the language that strings ever have a lower bound other than 1. **************************************************************** From: Jean-Pierre Rosen Sent: Sunday, July 09, 2000 9:29 AM I don't argue with facts (i.e. that there are still many people assuming that the low bound is 1). I'm just saying that the issue happens so often that this particular case would be insignificant compared with the number of other places where people have to care about it. Whether it was a good idea in the first place is another issue - we have to live with it. **************************************************************** From: Robert Dewar Sent: Sunday, July 09, 2000 9:41 PM Neverthless, it is likely that changing from a lower bound of 1 to slice semanitcs will indeed cause bugs in existing code. **************************************************************** From: Robert Dewar Sent: Sunday, July 09, 2000 9:04 AM The big piece of missing information here is what other compilers do. We know the answer for GNAT and Rational. But what about the other current compiler technologies. Surely someone can provide some more data. I really think this is the sort of issue where current practice is significant. If most compilers are one way rather than the other, that has some influence. **************************************************************** From: Pascal Leroy Sent: Monday, July 10, 2000 3:30 AM > which way does rational do things now, that's useful information! Rational return a lower bound of Low. I understand that Averstar and Janus do the same thing, so GNAT appears to be the odd man out. **************************************************************** From: Robert Dewar Sent: Monday, July 10, 2000 7:28 AM Janus is not exactly a critical entry here, since it is not a validated Ada 95 compiler. But we are missing input from Aonix, DDC-I, Irvine, OCS at least. Yes, most likely Aonix yields low, but they are not using the latest Averstar technology as I understand things, so we should double check this. **************************************************************** From: Robert Dewar Sent: Monday, July 10, 2000 7:58 AM I investigated the background a bit on why GNAT returns a lower bound of 1 rather than low. Interestingly this was a fairly recent change (May 1997). It seems to have been done as part of a uniform fix to ensure that lower bounds of 1 were returned, rather than any specific problem. The justification is noted as "change lower bound to 1 to conform with the RM", so clearly I read the RM at the time as requiring a lower bound of 1, but I really don't see what lead me clearly to that conclusion at this stage. So I must say I am inclined to just "fix" this in GNAT. But it would be nice to get a full set of reports from all vendors on this one. I will also investigate a bit further the historical record at the time of the GNAT change in this area to see if anything more might have motivated it. **************************************************************** From: Joyce L. Tokar Sent: Monday, July 10, 2000 7:08 AM Our (DDC-I's) implementation returns a string with the bounds low..high. **************************************************************** From: Robert Dewar Sent: Monday, July 10, 2000 10:24 PM Well I just "fixed" GNAT to return Low..High, since this seems to be the commonest choice, and is indeed the more natural reading of the RM. **************************************************************** From: Joyce L. Tokar Sent: Monday, July 10, 2000 8:26 PM I checked with Oliver about the behavior of their Ada 95 system on this issue -- OC's response is as follows In the OC Systems' Ada95 implementation, Ada.Strings.Bounded.Slice returns a slice whose bounds are Low..High, not 1..Length(Source). ****************************************************************