Version 1.4 of ais/ai-00301.txt

Unformatted version of ais/ai-00301.txt version 1.4
Other versions for file ais/ai-00301.txt

!standard A.4.5          02-08-26 AI95-00301/03
!class amendment 02-06-12
!status work item 02-06-12
!status received 02-06-12
!priority Medium
!difficulty Easy
!subject Missing operations in Ada.Strings.Unbounded
!summary
Additional Index functions are added to Ada.Strings.Fixed, Ada.Strings.Bounded, and Ada.Strings.Unbouned. Some additional operations are added to Ada.Strings.Unbounded and Ada.Strings.Bounded. I/O operations on unbounded strings are provided in a new child package of Ada.Text_IO.
!problem
The interface defined for Ada.Strings.Unbounded contains several instances where the Unbounded_String abstraction is incomplete. For instance, the Insert, Overwrite, and Replace_Slice routines do not include a version in which the new portion is an Unbounded_String. Thus, it is necessary to leave the Unbounded_String abstraction (by converting to String) in order to use these operations. This extra conversion also carries an unnecessary time and storage overhead.
A similar problem occurs when a slice of an Unbounded_String is needed. The Slice function returns a String, which is outside of the Unbounded_String abstraction.
Another problem is that Unbounded_Strings typically are implemented as a controlled type. That means that an assignment of an Unbounded_String has substantial overhead in the form of calls to Finalize and Adjust, and probably includes allocation of memory. Unbounded_Strings are often given values with the function To_Unbounded_String. However, when this function is used in an assignment statement, memory may be allocated twice (once by the function, and once by Adjust), which is substantial extra overhead. A procedure version of To_Unbounded_String would avoid this problem.
A commonly encountered problem is the need to find all of the occurrences of a pattern in a string. The Index function always searches the entire string. This is not a major problem for a String, as searching the appropriate slice is easy. However, for an Unbounded_String, the only practical solution is to convert the Unbounded_String to a String, again breaking the abstraction. A variant with a starting index as a parameter would be very useful.
And finally, I/O operations with Unbounded_Strings as parameters are not provided. These are commonly needed, and (in the case of Get_Line) are not trivial to write. An child package similar to the ones provided for Complex types would be valuable.
!proposal
Additional Index functions are added to Ada.Strings.Fixed:
function Index (Source : in String; Pattern : in String; From : in Positive; Going : in Direction := Forward; Mapping : in Maps.Character_Mapping := Maps.Identity) return Natural; -- If Going = Forward, this is equivalent to -- Index (Source(From..Source'Last), Pattern, Going, Mapping) -- Otherwise, this is equivalent to -- Index (Source(Source'First..From), Pattern, Going, Mapping) -- Index_Error is propogated if From is not in Source'range.
function Index (Source : in String; Pattern : in String; From : in Positive; Going : in Direction := Forward; Mapping : in Maps.Character_Mapping_Function) return Natural; -- If Going = Forward, this is equivalent to -- Index (Source(From..Source'Last), Pattern, Going, Mapping) -- Otherwise, this is equivalent to -- Index (Source(Source'First..From), Pattern, Going, Mapping) -- Index_Error is propogated if From is not in Source'range.
function Index (Source : in String; Set : in Maps.Character_Set; From : in Positive; Test : in Membership := Inside; Going : in Direction := Forward) return Natural; -- If Going = Forward, this is equivalent to -- Index (Source(From..Source'Last), Set, Test, Going) -- Otherwise, this is equivalent to -- Index (Source(Source'First..From), Set, Test, Going) -- Index_Error is propogated if From is not in Source'range.
function Index_Non_Blank (Source : in String; From : in Positive; Going : in Direction := Forward) return Natural; -- If Going = Forward, this is equivalent to -- Index_Non_Blank (Source(From..Source'Last), Going) -- Otherwise, this is equivalent to -- Index_Non_Blank (Source(Source'First..From), Going) -- Index_Error is propogated if From is not in Source'range.
Similar functions are added to Ada.Strings.Bounded and Ada.Strings.Unbounded.
The following operation is added to package Ada.Strings.Bounded:
procedure To_Bounded_String
(Target : out Bounded_String;
Str : in String);
-- Identical in effect to Target := To_Bounded_String (Str);
The following operation is added to package Ada.Strings.Unbounded:
procedure To_Unbounded_String
(Target : out Unbounded_String;
Str : in String);
-- Identical in effect to Target := To_Unbounded_String (Str);
The following operations are added to package Ada.Strings.Unbounded:
function Slice
(Source : in Unbounded_String;
Low : in Positive; High : in Natural)
return Unbounded_String; -- Identical to To_Unbounded_String (Slice (Source, Low, High));
procedure Slice
(Source : in Unbounded_String;
Target : out Unbounded_String; Low : in Positive; High : in Natural);
-- Identical to Target := To_Unbounded_String (Slice (Source, Low, High));
function Replace_Slice
(Source : in Unbounded_String;
Low : in Positive; High : in Natural; By : in Unbounded_String)
return Unbounded_String; -- Identical in effect to -- Replace_Slice (Source, Low, High, To_String (By));
procedure Replace_Slice
(Source : in out Unbounded_String;
Low : in Positive; High : in Natural; By : in Unbounded_String);
-- Identical in effect to -- Replace_Slice (Source, Low, High, To_String (By));
function Insert
(Source : in Unbounded_String;
Before : in Positive; New_Item : in Unbounded_String)
return Unbounded_String; -- Identical in effect to Insert (Source, Before, To_String (New_Item));
procedure Insert
(Source : in out Unbounded_String;
Before : in Positive; New_Item : in Unbounded_String);
-- Identical in effect to Insert (Source, Before, To_String (New_Item));
function Overwrite
(Source : in Unbounded_String;
Position : in Positive; New_Item : in Unbounded_String)
return Unbounded_String; -- Identical in effect to -- Overwrite (Source, Position, To_String (New_Item));
procedure Overwrite
(Source : in out Unbounded_String;
Position : in Positive; New_Item : in Unbounded_String);
-- Identical in effect to -- Overwrite (Source, Position, To_String (New_Item));
Similar operations are added to Ada.Strings.Bounded.
The following child package is defined:
with Ada.Strings.Unbounded; package Ada.Text_IO.Unbounded_IO is
procedure Put (File : in File_Type; Item : in Ada.Strings.Unbounded.Unbounded_String); -- Identical in effect to -- Ada.Text_IO.Put (File, To_String (Item));
procedure Put (Item : in Ada.Strings.Unbounded.Unbounded_String); -- Identical in effect to -- Ada.Text_IO.Put (To_String (Item));
procedure Put_Line (File : in Ada.Text_IO.File_Type; Item : in Ada.Strings.Unbounded.Unbounded_String); -- Identical in effect to -- Ada.Text_IO.Put_Line (File, To_String (Item));
procedure Put_Line (Item : in Ada.Strings.Unbounded.Unbounded_String); -- Identical in effect to -- Ada.Text_IO.Put_Line (To_String (Item));
function Get_Line (File : in File_Type) return Ada.Strings.Unbounded.Unbounded_String; -- Equivalent to: -- declare -- Buffer : String (1 .. String'Last); -- Last : Natural; -- begin -- Get_Line (File, Buffer, Last); -- return Ada.Strings.Unbounded.To_Unbounded_String(Buffer(1..Last)); -- end; -- Note: This code will not work on most compilers.
function Get_Line return Ada.Strings.Unbounded.Unbounded_String; -- Identical to Get_Line (Ada.Text_IO.Current_Input);
procedure Get_Line (File : in File_Type; Item : out Ada.Strings.Unbounded.Unbounded_String); -- Equivalent to Item := Get_Line (File);
procedure Get_Line (Item : out Ada.Strings.Unbounded.Unbounded_String); -- Equivalent to Item := Get_Line;
end Ada.Text_IO.Unbounded_IO;
There is a similar package for Ada.Wide_Text_IO.Wide_Unbounded_IO.
!wording
Detailed wording changes are not yet provided.
However, the existing Index routines ought to be described in terms of the new versions with the From parameter, not the other way around.
Get_Line ought to be described in terms of words similar to the existing Get_Line, not as above.
!discussion
The new operations added to Ada.Strings.Fixed, Ada.Strings.Bounded, and Ada.Strings.Unbounded could potentially cause new ambiguities in programs if there is a use clause for the string package. However, this is unlikely, and no programs change meaning (any incompatibilities cause compile-time errors).
The procedure version of To_Bounded_String and To_Unbounded_String probably would be better named 'Set'. However, that would increase the possibility of a name collision incompatibility, so we did not change the names.
An earlier proposal for the Index functions was to add a defaulted From parameter to the end of the existing routine. This would look like:
function Index (Source : in String; Pattern : in String; Going : in Direction := Forward; Mapping : in Maps.Character_Mapping := Maps.Identity; From : in Positive := 1) return Natural;
This has the advantage of not adding new routines. However, there are several serious drawbacks. First, this is significantly more incompatable than just adding new routines. Adding a parameter like this affects renames and overridings. One compiler vendor notes that they've seen derivations of Unbounded_Strings in their customer's code.
Second, the correct default value is not clear. For Ada.Strings.Fixed, it depends on the bounds of Source and Going. Since a default value cannot depend on the values of other parameters, it is not possible to write the correct default value. For Ada.Strings.Bounded and Ada.Strings.Unbounded, the value depends on Going and Length(Source), which again is a problem. We could define the parameter to have type Natural, and define 0 to mean the start or end of Source as needed, but that seems very ugly and confusing.
Thus, we added new routines with the From parameter as the last non-defaulted parameter.
Ada.Text_IO.Unbounded_IO.Get_Line can be implemented as follows:
procedure Get_Line (File : in File_Type; Item : out Ada.Strings.Unbounded.Unbounded_String) is Buffer : String (1 .. 80); Last : Natural; begin Get_Line (File, Buffer, Last); Ada.Strings.Unbounded.To_Unbounded_String (Item, Buffer(1..Last)); while Last = Buffer'Last loop Get_Line (File, Buffer, Last); Ada.Strings.Unbounded.Append (Item, Buffer(1..Last)); end loop; end Get_Line;
However, all of the I/O operations can be better implemented if they have access to the internal representation of the Unbounded_String type. That can be accomplished with an implementation-defined child package of Ada.Strings.Unbounded.
Ada.Text_IO.Unbounded_IO is defined as a child of Text_IO to match the already existing Ada.Text_IO.Complex_IO. This makes the definition of a similar package for Bounded_Strings complex. It could be defined something like:
with Ada.Strings.Bounded.Generic_Bounded_Length; generic with package Bounded is new Ada.Strings.Bounded.Generic_Bounded_Length (<>); package Ada.Text_IO.Bounded_IO is
procedure Put (File : in File_Type; Item : in Bounded.Bounded_String);
...
end Ada.Text_IO.Bounded_IO;
but this didn't seem to have sufficient benefit to mandate. (An implementation can add this package if it likes.) Note that we're never going to be totally consistent here, as Ada.Text_IO.Fixed_IO already exists for another purpose, and such a package would not make any sense anyway, as Text_IO already includes the needed operations.
!example
Consider the task of finding all occurrences of a given string Pattern in a given Unbounded_String Source. Currently, one either has to convert the whole unbounded string into a string and search on that, or implement the search oneself (ignoring character mappings here):
declare Src_Length : constant Natural := Length (Source); From : Positive := 1; begin while From + Pattern'Length - 1 <= Src_Length loop if Slice (Source, From, From + Pattern'Length - 1) = Pattern then -- Found an occurrence of Pattern; process it, then advance From From := From + Pattern'Length; -- No overlapping occurrences! else From := From + 1; end if; declare end loop; end;
This incurs quite some extra storage overhead because the (necessary!) call to Slice is required to return a copy of the slice of Source.Reference. With the proposed new Index functions, this could be simplified to:
declare Src_Length : constant Natural := Length (Source); From : Positive := 1; Idx : Natural; begin while From + Pattern'Length - 1 <= Src_Length loop Idx := Ada.Strings.Unbounded.Index (Source, Pattern, From); exit when Idx = 0; -- Found an occurrence of Pattern starting at Idx. -- Process this occurrence, then advance From From := Idx + Pattern'Length; -- No overlapping occurrences! end loop; end;
which may be much more efficient because there are no intermediary copies to String involved.
The same is true for writing an Unbounded_String. Currently, one has to do
Ada.Text_IO.Put_Line (Ada.Strings.Unbounded.To_String (My_Unbounded_String));
which also incurs this extra intermediary String representation, which could be avoided with the new operation
Ada.Strings.Unbounded.Text_IO (My_Unbounded_String);
!appendix

Editor's note: The original proposal was submitted by Thomas Wolf on
Wednesday, June 12, 2002. It was edited slightly, but otherwise used intact
as the initial draft of this AI.

****************************************************************

From: Randy Brukardt
Sent: Wednesday, June 12, 2002  10:14 PM

Thanks for the submission.

I've recently had similar problems using Unbounded_Strings. The fact that
you have to frequently convert to strings to get anything done is rather
annoying. I'm surprised I haven't heard about it before. (I personally
hadn't tried to use Unbounded_Strings before the recent program, a spam
scanner.)

Some comments:

I ran into a significant performance problem when I needed to test whether
the start of an unbounded string matched some other string. The simple
solution of
   if Index (Source, To_String(Pattern)) = 1 then
turned out to be horribly inefficient.

The better solution of
   if Slice (Source, 1, Length(Pattern)) = To_String(Pattern) then
is wrong if Source is too short (Constraint_Error is raised).

The correct solution of
   if Length (Source) >= Length (Pattern) and then
      Slice (Source, 1, Length(Pattern)) = To_String(Pattern) then
also is rather inefficient, because it has 5 function calls and two
unconstrained string returning function calls, which need to use heap or
other expensive memory allocation, and of course, two extra copies.

Being a compiler writer, I ended up declaring
    Is_Prefix (Source, Pattern : in Unbounded_String)
in a child package, and implemented it myself. But I don't recommend that
solution to everyone.

I also ran into the missing Index from the middle issue (which comes up when
you want to process all of the matches for the pattern), which I solved by
punting: convert to a String and use Ada.Strings.Fixed.Index.

> Also, while there is a function To_Unbounded_String, there is
> no procedural variant for this operation. However, such a
> procedure would be very useful, for it might be more efficient
> than the function.

Arguably, a procedural variant of To_String also would be useful, and it
certainly would be more efficient. (Indeed, the To_Unbounded_String
function, which returns a bounded record type, is usually cheap in most
implementations.)

> Implementation advice: all these new operations should be implemented such
> that no extra (intermediary) copy of the string (slice) data is created.

I'm not sure there is much point to this. The existing implementations of
Ada.Strings.Unbounded vary wildly in efficiency, I don't see any reason to
try to say anything more about any new functions.

I realize that you are trying to prevent people from implementing these
simply by calling Slice or To_String, but an implementor that would do that
probably has done it in many other parts of Ada.Strings.Unbounded.

...

>   function Get_Line
>     (File : in Ada.Text_IO.File_Type)
>     return Unbounded_String;
>   --  Reads up to the end of the line, returning the whole line as an
>   --  unbounded string. If a line terminator is met, Skip_Line is (in effect)
>   --  called with a spacing of 1, and the characters up to but not including
>   --  the line terminator are returned. If a file terminator is met, the
>   --  characters up to but not including the file terminator are returned,
>   --  except if the file terminator is met before any characters have been
>   --  read; in this case, Ada.IO_Exceptions.End_Error is raised.

There are two problems with this. First, you've left out the part of about
what happens if the string is full. This is a real issue on compilers where
Integer is 16-bit. The standard has to cover all of the cases, even if they
are unlikely on most implementations.

Second, there is no good reason to change the semantics of Get_Line at a
file terminator. But this is based on a serious misconception about Ada (see
below). And in any case, Get_Line should operate like the Get_Line we all
expect.

> Implementation advice: the Put and Put_Line operations should be implemented
> such that no extra copy of the string data occurs.

This one if even more dubious than the first one. It's likely that the
implementation has to copy the string data for the existing Put and
Put_Line, to put it into a buffer or to pass it to the operating system.
You're saying that it can't do that? I think every implementation would have
to violate this advice, making it useless. I realize your point is that you
want the string directly written out, but again I think you have to leave
this to the implementor.


> BTW: since Ada.Text_IO.End_Of_Line returns True when a line *or a file*
> terminator are next in the input (A.10.5(13)), I believe it would make sense
> to clearly define that Ada.Text_IO.Get_Line does *not* raise End_Error upon
> the last line of a file if that last line has no line terminator, but ends
> on EOF directly. Otherwise, it may get rather difficult to read the last
> line of such a file. In other words, define "end of the line" as "line
> terminator or EOF", and Get_Line to read up to the end of the line (and
> call Skip_Line (1) if it stopped on a line terminator, but not if it
> stopped on EOF).

Why do you think this is not true? There is always an line terminator (and
page terminator) before the file terminator, see A.10(7). If a file does not
explicitly have a line terminator at the end of the file, the implementation
has to implicitly provide one. That's been true since Ada 80.


> It should also be evaluated whether similar additions as proposed above
> for bounded strings (Ada.Strings.Bounded) would make sense.

Deleting bounded strings would make more sense. :-)


In your discussion section, you probably should mention that being forced to
leave the unbounded string abstraction to do common operations substantially
weakens the abstraction. If you have to do it often enough, you begin to
wonder what exactly you are gaining by using Unbounded_String. (I doubt that
I will use Unbounded_String again, because it was as messy or messier than
simply using good old regular strings - in large part because I continually
had to convert back to String.)

****************************************************************

From: Ted Baker
Sent: Thursday, June 13, 2002  5:55 AM

I would second Randy's comments on Unbounded_String.  Back when we
were teaching Ada here I tried to introduce this package for
students to use in programming assignments.  I ended up giving up.
I reverted to an Ada 83-style package of my own construction in
which the (length, string) components were both visible.
Otherwise, the students ended up doing so many copying conversion
operations that I felt embarassed.  I was embarassed because the
less perceptive students might think I meant to teach that
sloppiness about recopying arrays is good style, and the better
students might conclude that Ada is an inherently inefficient
language.

****************************************************************

From: Thomas Wolf
Sent: Thursday, June 13, 2002  5:20 AM

On 12 Jun 2002 at 22:14, Randy Brukardt wrote:

> I've recently had similar problems using Unbounded_Strings.

Good to see that I'm not the alone with this!

> Second, there is no good reason to change the semantics of Get_Line at a
> file terminator. But this is based on a serious misconception about Ada (see
> below). And in any case, Get_Line should operate like the Get_Line we all
> expect.

About the misconception, see below.

> > Implementation advice: the Put and Put_Line operations should be
> > implemented such that no extra copy of the string data occurs.
>
> This one if even more dubious than the first one. It's likely that the
> implementation has to copy the string data for the existing Put and
> Put_Line, to put it into a buffer or to pass it to the operating system.
> You're saying that it can't do that? I think every implementation would have
> to violate this advice, making it useless. I realize your point is that you
> want the string directly written out, but again I think you have to leave
> this to the implementor.

No. The intention of this is just to discourage an implementation like

procedure Put_Line (S : in Unbounded_String) is
begin
  Ada.Text_IO.Put_Line (To_String (S));
end Put_Line;

(well, at least discourage it unless the compiler is smart enough to
avoid doing in effect

  declare
     Tmp : String := To_String (S);
  begin
     Ada.Text_IO.Put_Line (Tmp);
  end;

That's what I meant by "extra copy". Note that I wrote "extra copy",
not just "copy".)

> Why do you think this is not true? There is always an line terminator (and
> page terminator) before the file terminator, see A.10(7). If a file does not
> explicitly have a line terminator at the end of the file, the implementation
> has to implicitly provide one. That's been true since Ada 80.

Indeed. I misread A.10(7) and was tricked into thinking it wasn't so
because I have come across at least one implementation that didn't do
it this way and failed horribly on the last line of a file if that line
didn't have and end-of-line. But you're right, the whole issue is moot.

> > It should also be evaluated whether similar additions as proposed above
> > for bounded strings (Ada.Strings.Bounded) would make sense.
>
> Deleting bounded strings would make more sense. :-)

I do see the smiley, but much as I'd like to see it go, it wouldn't be
backwards compatible. So I'm afraid somebody will have to think about
whether there should be similar operations for bounded strings, too.

****************************************************************

From: Robert A. Duff
Sent: Thursday, June 13, 2002  9:27 AM

> No. The intention of this is just to discourage an implementation like
>
> procedure Put_Line (S : in Unbounded_String) is
> begin
>   Ada.Text_IO.Put_Line (To_String (S));
> end Put_Line;

It's a mistake to try to put this sort of thing in a standard.
If you want the compiler to do things efficiently, pester your
compiler vendor (preferably with checkbook in hand).  ;-)

The language definition should make it feasible, and perhaps even easy,
to do things efficiently.  But it should not try to *force* efficiency.

> (well, at least discourage it unless the compiler is smart enough to
> avoid doing in effect
>
>   declare
>      Tmp : String := To_String (S);
>   begin
>      Ada.Text_IO.Put_Line (Tmp);
>   end;
>
> That's what I meant by "extra copy". Note that I wrote "extra copy",
> not just "copy".)

OK, I have an implementation that does 37 copies, but it doesn't do the
"extra" 38'th one I was thinking of.  Is that good enough?  ;-)

My point is that defining "extra" in the context of a standard is not
feasible.  So don't waste a lot of energy trying.

Compiler writers do not deliberately try to make their products
inefficient.  Of course they cut corners to save money.  So what they
need is pressure from paying customers, so they can set their
optimization priorities right.

****************************************************************

From: Robert A. Duff
Sent: Thursday, June 13, 2002  9:39 AM

Randy said:

> Deleting bounded strings would make more sense. :-)

A year or so ago, I was writing a lexical analyzer.
I needed a buffer to keep the token text in (for identifiers
and the like), and there's a max length for tokens,
so I used Bounded_Strings, with a max length of 1000 or so.

I expected the lexer to be slower than the parser, since lexers look at
each character, whereas parsers look only at each token.  But the lexer
was 60 *times* slower, which surprised me.

After some investigation, I discovered that for each token, and for each
whitespace and comment character, it was entering the block that
declared the buffer.  One might expect that to be nearly free -- it has
to initialize the buffer length to 0.

But the implementation of Bounded_Strings initialized all 1000
characters (because some AI says "=" has to compose on these things)!

Changing it to use my own record type (length plus array of characters,
just like Bounded_String, but without the useless initialization),
increased the speed of the lexer by a factor of 100.

So much for reusable abstractions.

****************************************************************

From: Robert Dewar
Sent: Saturday, June 22, 2002  6:30 AM

> Implementation advice: the Put and Put_Line operations should be implemented
> such that no extra copy of the string data occurs.

The Ada RM is no place to put in requests for some particular optimization
that you want to see. How to spend time and effort in improving performance
of various language constructs is between vendors and the marketplace.

This partciular IA is ill advised in my opinion in any case, but for sure
IA of this type does not belong.

****************************************************************

From: Robert Dewar
Sent: Saturday, June 22, 2002  7:17 AM

> Compiler writers do not deliberately try to make their products
> inefficient.  Of course they cut corners to save money.  So what they
> need is pressure from paying customers, so they can set their
> optimization priorities right.

It is not a matter of cutting corners even. A simple implementation that
does an extra copy may be far superior to a complex one that does an
extra copy if the time for the extra copy is negligible in the entire
context of performance requirements.

****************************************************************

From: Robert A. Duff
Sent: Saturday, June 22, 2002  11:55 AM

> It is not a matter of cutting corners even.

Well, perhaps "cutting corners" is a somewhat rude choice of words.

>... A simple implementation that
> does an extra copy may be far superior to a complex one that does an
> extra copy if the time for the extra copy is negligible in the entire
> context of performance requirements.

I think you're missing a "not" in the above sentence.  Amusing typo.  ;-)

Anyway, you and I obviously agree that the kind of optimization advice
being discussed does not belong in the RM.

****************************************************************

From: Robert Dewar
Sent: Sunday, June 23, 2002  6:33 AM

Here is a package that we provide with GNAT that we have found useful for
solving some of these problems

------------------------------------------------------------------------------
--                                                                          --
--                         GNAT RUNTIME COMPONENTS                          --
--                                                                          --
--            A D A . S T R I N G S . U N B O U N D E D . A U X             --
--                                                                          --
--                                 S p e c                                  --
--                                                                          --
--                            $Revision: 1.4 $                              --
--                                                                          --
--          Copyright (C) 1992-1998, Free Software Foundation, Inc.         --
--                                                                          --
-- GNAT is free software;  you can  redistribute it  and/or modify it under --
-- terms of the  GNU General Public License as published  by the Free Soft- --
-- ware  Foundation;  either version 2,  or (at your option) any later ver- --
-- sion.  GNAT is distributed in the hope that it will be useful, but WITH- --
-- OUT ANY WARRANTY;  without even the  implied warranty of MERCHANTABILITY --
-- or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License --
-- for  more details.  You should have  received  a copy of the GNU General --
-- Public License  distributed with GNAT;  see file COPYING.  If not, write --
-- to  the Free Software Foundation,  59 Temple Place - Suite 330,  Boston, --
-- MA 02111-1307, USA.                                                      --
--                                                                          --
-- As a special exception,  if other files  instantiate  generics from this --
-- unit, or you link  this unit with other files  to produce an executable, --
-- this  unit  does not  by itself cause  the resulting  executable  to  be --
-- covered  by the  GNU  General  Public  License.  This exception does not --
-- however invalidate  any other reasons why  the executable file  might be --
-- covered by the  GNU Public License.                                      --
--                                                                          --
-- GNAT was originally developed  by the GNAT team at  New York University. --
-- It is now maintained by Ada Core Technologies Inc (http://www.gnat.com). --
--                                                                          --
------------------------------------------------------------------------------

--  This child package of Ada.Strings.Unbounded provides some specialized
--  access functions which are intended to allow more efficient use of the
--  facilities of Ada.Strings.Unbounded, particularly by other layered
--  utilities (such as GNAT.Patterns).

package Ada.Strings.Unbounded.Aux is
pragma Preelaborate (Aux);

   function Get_String (U  : Unbounded_String) return String_Access;
   pragma Inline (Get_String);
   --  This function returns the internal string pointer used in the
   --  representation of an unbounded string. There is no copy involved,
   --  so the value obtained references the same string as the original
   --  unbounded string. The characters of this string may not be modified
   --  via the returned pointer, and are valid only as long as the original
   --  unbounded string is not modified. Violating either of these two
   --  rules results in erroneous execution.
   --
   --  This function is much more efficient than the use of To_String
   --  since it avoids the need to copy the string. The lower bound of the
   --  referenced string returned by this call is always one.

   procedure Set_String (UP : in out Unbounded_String; S : String);
   pragma Inline (Set_String);
   --  This function sets the string contents of the referenced unbounded
   --  string to the given string value. It is significantly more efficient
   --  than the use of To_Unbounded_String with an assignment, since it
   --  avoids the necessity of messing with finalization chains. The lower
   --  bound of the string S is not required to be one.

   procedure Set_String (UP : in out Unbounded_String; S : String_Access);
   pragma Inline (Set_String);
   --  This version of Set_String takes a string access value, rather than a
   --  string. The lower bound of the string value is required to be one, and
   --  this requirement is not checked.

end Ada.Strings.Unbounded.Aux;


----------------------
-- REVISION HISTORY --
----------------------

--  ----------------------------
--  revision 1.1
--  date: 1997/01/25 15:24:52;  author: dewar;  state: Exp;
--  Initial revision
--  ----------------------------
--  revision 1.2
--  date: 1997/01/26 20:31:05;  author: dewar;
--  Add pragma Inline for Set_String
--  (Set_String): New version taking a String_Access value
--  ----------------------------
--  revision 1.3
--  date: 1998/04/27 12:14:21;  author: dewar;
--  Remove unused withs
--  Add missing copyright line to header
--  ----------------------------
--  New changes after this line.  Each line starts with: "--  "

****************************************************************

From the minutes of the Vienna meeting:

Randy explains that the readability of programs using unbounded strings is a
problem, because you have to convert to type String to do anything interesting.

Jean-Pierre comments that unbounded strings are really for storage; don't use
them for manipulation. That doesn't seem to be the intent expressed in the
standard.

Tucker would like to see a procedure version of To_Unbounded_String. He also
would like to add a defaulted starting parameter to all the Index functions.
Pascal immediately claims that that is not compatible.

Tucker hates making this look like an add-on. The new parameter would have to
be at the end. In that case, only renames (and overriding via derivation) would
be incompatible. These are unlikely.

There is not much interest in the slice version of the operations that were
proposed.

The group feels that I/O is generally valuable. Complex has this, and it is a
child of Text_IO. But you need access to the representation of unbounded
string. So it appears that it has to be a child of Unbounded. Steve Baird
objects, you could have an implementation package as a child of Unbounded.

Thus, we settle on the name Ada.Text_IO.Unbounded_IO to make it like
Complex_IO. It could be a rename from an implementation package. There also
would be a wide version (Ada.Wide_Text_IO.Unbounded_IO), of course.

The From parameter will need to be added to all index functions (for Fixed,
Bounded, and Unbounded).

****************************************************************

From: Adam Beneschan [adam@irvine.com]
Sent: Thursday, August 22, 2002 3:33 PM

[Editor's note: This comment was sent on a different subject, but since it
gives information about the real-world uses of Unbounded_Strings, I've also
attached it here...]

It may not be as weird as you think.  If Ada.Strings.Unbounded.-
Unbounded_String (defined as untagged private) is implemented as a
child of Ada.Finalization.Controlled (which is tagged), as suggested
by the Rationale, then any package that tries to derive from
Unbounded_String will run into this situation.  (We've seen real-life
Ada code that does define types derived from Unbounded_String.)

****************************************************************

From: Nick Roberts
Sent: Saturday, August 24, 2002 5:44 PM

I disagree with the argument against using the name
"Set_Unbounded_String" instead of "To_Unbounded_String" (for the
new procedures which analogise the "To_Unbounded_String"
functions).

The argument given in AI-301 (v1.3) is that using "Set_" would
risk name collisions with existing code. However, I do not see
why this name would carry any greater risk, in general, than
"To_". Thus I would urge the use of the less confusing "Set_".

I'm otherwise enthusiastic about the extra facilities.

****************************************************************

From: Randy Brukardt
Sent: Saturday, August 24, 2002  6:20 PM

But the proposed name was "Set", not "Set_Unbounded_String". Try the argument
again with that name...

****************************************************************

From: Robert Dewar
Sent: Saturday, August 24, 2002  9:11 PM

I consider this spec awful. It is highly non-upwards incompatible. I don'
t think we would consider implementing something that was upwards
incompatible in this way.

Take for example, Slice which now can return either a string or
unbounded string.

Well I have all over the place overloaded functions that take either
a string or unbounded string. The following is a model of the sort
of program that will get blown up:

with Ada.Strings.Unbounded; use Ada.Strings.Unbounded;
procedure Q is
   procedure Pstr (S : String) is begin null; end;
   procedure Pstr (S : Unbounded_String) is begin null; end;

   A : Unbounded_String;

begin
   Pstr (Slice (A, 1, 3));
end;

Sure, things can be mended. But mending things in a big system is not as
easy as technical folks would suppose.

I am strongly opposed to *ANY* non-upwards compatible changes, especially
when they are gratuitous as in this case.

Robert Dewar

The AI says

The new operations added to Ada.Strings.Fixed, Ada.Strings.Bounded, and
Ada.Strings.Unbounded could potentially cause new ambiguities in programs if
there is a use clause for the string package. However, this is unlikely,
and no programs change meaning (any incompatibilities cause compile-time
errors).

How did anyone decide this is unlikely?

The fact that no programs change meaning is thin comfort to people who have
to make changes to programs that they are not necessarily familiar with and
which are under strict configuration control (meaning that the process for
making ANY changes can be heavy).

To me, the ONLY acceptable way of adding functionality to existing packages
is to add child packages. Yes, it is a bit kludgy, but elegance takes a
back seat to compatibility requirements at this stage.

This makes me worry a lot. I hope this unacceptably casual view of
incompatibility is not showing up in other AI's. If it is, then I
think it makes it likely that the entire set of extensions will get
ignored.

****************************************************************

From: Randy Brukardt
Sent: Monday, August 26, 2002  6:21 PM

Yes, this example demonstrates the problem with Ada.Strings.Unbounded.

The most likely reason that PStr is overloaded in the first place is because
Slice didn't return an Unbounded string. So both versions were necessary to
avoid code explosion.

So your point is essentially that we can't fix the broken abstraction of
Ada.Strings.Unbounded because it would break the workarounds to the broken
abstraction of Ada.Strings.Unbounded. Perhaps you are right (this is only an
early draft of an AI in any case, and it will need to be discussed again).
That does seem to be a sad state of affairs if true.

In that case, my preference would be to dump Ada.Strings.Unbounded
altogether and start over (which would be compatible, I believe). Trying to
fix this with a child package would only make the embarassment of a broken
abstraction permanent. (If you think that Ada other than GNAT is on its
deathbed anyway, perhaps it doesn't matter.)

****************************************************************

From: Robert Dewar
Sent: Monday, August 26, 2002  9:06 PM

>
> In that case, my preference would be to dump Ada.Strings.Unbounded
> altogether and start over (which would be compatible, I believe). Trying to
> fix this with a child package would only make the embarassment of a broken
> abstraction permanent. (If you think that Ada other than GNAT is on its
> deathbed anyway, perhaps it doesn't matter.)

Actually Ada is alive and well, and we have plenty of competition :-)

I certainly would not "dump" Ada.Strings.Unbounded, that would be even more
incompatible. If you really think it is worth creating another slightly
different abstraction, go ahead. It seems like a waste of time to me. There
are more important fish to fry.

> > procedure Q is
> >    procedure Pstr (S : String) is begin null; end;
> >    procedure Pstr (S : Unbounded_String) is begin null; end;
> >
> >    A : Unbounded_String;
> >
> > begin
> >    Pstr (Slice (A, 1, 3));
> > end;
>
> Yes, this example demonstrates the problem with Ada.Strings.Unbounded.
>
> The most likely reason that PStr is overloaded in the first place is because
> Slice didn't return an Unbounded string. So both versions were necessary to
> avoid code explosion.

Nope, the reason that Pstr operates on String's and unbounded_strings is that
I have strings and unbounded strings in my application (and that will be trye
whatever you do to "improve" unbounded_string) and I want Pstr to be easily
applied to either.

Right now, the Unbounded_String abstraction is friendly to this approach.
In addition, imagine what a mess you get with trying to concatenate slices
if you make the change to Slice.

Unbounded_String is not nearly broken enough to be worth considering non
upwards-compatible fixes.

I don't think the abstraction is broken, on the contrary in some respects I
think you are trying to break it!

Please, let's not even consider seriously non-upwards compatible changes. They
will simply get ignored at this stage, and rightly so.

****************************************************************

From: Randy Brukardt
Sent: Monday, August 26, 2002  9:41 PM

> I don't think the abstraction is broken, on the contrary in some respects I
> think you are trying to break it!

Please explain this.

I think that it should be possible to pick a single string abstraction and
stick with it without having to switch to another. Ada.Strings.Unbounded falls
far short of this, many common operations force switching to regular strings
and Ada.Strings.Fixed. Very few operations allow any sort of combination or
operations purely on unbounded strings. That so true that Jean-Pierre Rosen
says that Ada.Strings.Unbounded is only good for storing strings. But if that
is true, why are all of those other operations there?

The only way to fix that is to add operations that can combine two unbounded
strings. Otherwise, the package is rather an embarrassment, as it is a lousy
example of an abstraction.

In my spam scanner, I tried to use Ada.Strings.Unbounded consistently with the
idea of showing Ada novices that it isn't any harder to use Ada than other
languages. Bad plan; I had to pull the strings out into Strings repeatedly to
search them, to do replacements, and other operations. Indeed, hardly anything
could be accomplished without converting to String - which is very verbose. If
I had realized that would be the case, I wouldn't have bothered with unbounded
string at all - the memory management would have been fairly simple.

Anyway, I have to agree that there are more important things to do. But it
seems unlikely to me that most of those would not meet your 'perfect
compatibility' requirement.

****************************************************************

From: Thomas Wolf
Sent: Monday, August 26, 2002  7:47 AM

I note that the proposal now includes new Index functions for
Ada.Strings.Fixed. Are these needed? If you have fixed strings,
can't you just work with slices directly? I thought the Index
operations were needed only for bounded and unbounded strings.

Furthermore, I think it should be specified that these new
Index operations propagate Ada.Strings.Index_Error if
From > Length (Source).

Also, you write that there'd be "similar operations" for bounded
and unbounded strings. What does that mean? What's the type of
parameter Pattern? String or (Un)Bounded_String? Or are there to
be two versions of each Index operation, one with Pattern of type
String, and one with Pattern of type (Un)Bounded_String?

If the latter, do you plan to also add versions of the existing
Index functions (without the From parameter) where the Pattern
would be of type (Un)Bounded_String?

If not the latter, why would there be only one, but not the other
variant?

In procedure Slice, I'd change the order of parameters to

procedure Slice
  (Source : in     Unbounded_String;
   Low    : in     Positive;
   High   : in     Natural;
   Target :    out Unbounded_String);

(specify the whole slice before passing the target, similar
to Replace_Slice.)

Why is parameter Target in To_Bounded_String and To_Unbounded_String
of mode "in out" instead of "out"? And why is parameter Item in
procedure Ada.Text_IO.Unbounded_IO.Get_Line of mode "in out" and
not just "out"? In procedure Slice, Target is of mode "out"...

Other than that, it looks good to me, although I do not understand
why the ARG did not retain the originally proposed operations where
the second parameter would be a slice of an (Un)Bounded_String,
specified by the string and a low and a high index. Ok, that would
add quite a few additional variants to the interface, but that's all,
and the interface would then be really complete. As it is, one still
would have to go through intermediary explicit representations of
the slices.

****************************************************************

From: Randy Brukardt
Sent: Monday, August 26, 2002  7:25 PM

> I note that the proposal now includes new Index functions for
> Ada.Strings.Fixed. Are these needed? If you have fixed strings,
> can't you just work with slices directly? I thought the Index
> operations were needed only for bounded and unbounded strings.

This is simply for consistency. Virtually every function in Ada.Strings.Unbounded has a similar function in Ada.Strings.Fixed. It would be odd if we didn't carry that through for the new Index functionality.

> Furthermore, I think it should be specified that these new
> Index operations propagate Ada.Strings.Index_Error if
> From > Length (Source).

Yes, of course.

> Also, you write that there'd be "similar operations" for bounded
> and unbounded strings. What does that mean?

Similar in the same way that these four are defined to the existing ones (just adding a From parameter).

> Do you plan to also add versions of the existing
> Index functions (without the From parameter) where the Pattern
> would be of type (Un)Bounded_String?

No. "Patterns" are rather special; they typically aren't manipulated at the same time as the items being searched. So I didn't include Index and Count routines with Unbounded_String patterns.

That could be done (I wouldn't object strongly), but I worry if the compatibility concerns mentioned by Robert would be severe in this case.

> In procedure Slice, I'd change the order of parameters to
>
> procedure Slice
>   (Source : in     Unbounded_String;
>    Low    : in     Positive;
>    High   : in     Natural;
>    Target :    out Unbounded_String);
>
> (specify the whole slice before passing the target, similar
> to Replace_Slice.)

Huh? The Target (well, it's called "Source", but it's where the result is written) is the first parameter of Replace_Slice.

> Why is parameter Target in To_Bounded_String and To_Unbounded_String
> of mode "in out" instead of "out"? And why is parameter Item in
> procedure Ada.Text_IO.Unbounded_IO.Get_Line of mode "in out" and
> not just "out"? In procedure Slice, Target is of mode "out"...

Sloppy work on my part. My personal programming style never uses "out" parameters on composite types, as they are always default-initialized and that initialization must not be lost. (Practically, "out" and "in out" are the same for composite types anyway.
) But that's wrong for this package.

> Other than that, it looks good to me, although I do not understand
> why the ARG did not retain the originally proposed operations where
> the second parameter would be a slice of an (Un)Bounded_String,
> specified by the string and a low and a high index. Ok, that would
> add quite a few additional variants to the interface, but that's all,
> and the interface would then be really complete. As it is, one still
> would have to go through intermediary explicit representations of
> the slices.

Most of the ARG's concern was with the broken abstraction (the fact that you have to explicitly convert to String before you can do much useful). Performance issues were secondary (Unbounded strings are pretty expensive in general; you won't use them much 
if performance is critical). The main reason for the procedure versions of functions is simply that most of the functions already have procedure versions; they all (except for the operator systems) should have them to be consistent.

However (in my opinion), your proposed slice operations exist only to improve efficiency. They're complex (at least in appearance), they muddy the abstraction, and they add a lot of weight to the interface - mostly on little used subprograms. No one spoke 
in favor of them; the general reaction was "UGH".

We all have ideas that we think are great that get killed for one reason or another. It's just part of the process - it's rare that a final feature looks much like the initial proposal.

****************************************************************

From: Robert Dewar
Sent: Monday, August 26, 2002  10:30 PM

I use unbounded strings extensively in all my SPITBOL like programs using
g-spipat, and those programs work just fine.

They definitely will *NOT* work fine if you make the changes you are planning.

Once again, this is not nearly broken enough to even consider non-upwards
compatible changes.

We did not allow non-UC changes in Ada 95 except with VERY good justification,
which certainly does not apply here.

Data point: Not one of our customers ever asked questions about these aspects
of US, or complained, and believe me, they ask plenty of questions and make
plenty of complaints about the language in other respects.

The fact that someone decides something is broken is not enough reason to go
pestering it. Please point to a major user of Ada for whom this is an important
issue.

****************************************************************

From: Randy Brukardt
Sent: Tuesday, August 27, 2002  9:21 AM

This proposal did not come from me, I'm just trying to write it up. I don't
know the original proposer, so I can't say what perspective he has.

****************************************************************

From: Pascal Leroy
Sent: Tuesday, August 27, 2002  3:20 AM

I find Robert's argument compelling.  While I am not adamantly opposed to
non-upward compatible changes, I think the above example is something we
want to avoid, as it seems like a perfectly legitimate programming style.
It's one thing to add a new subprogram named Mess_With_Unbounded_String to a
package spec (where incompatibilities can only come from use clause
collisions, and are probably rare if the name is sufficiently convoluted).
It's an entirely different thing to add a new overload with parameter/result
types that are likely to be used in conjunction with unbounded strings.

Interestingly enough, I don't remember agreeing to changes regarding slices
during the last meeting.  That may have been overheating, but then I see in
the minutes that "there is not much interest in the slice version of the
operations that were proposed".  The ARG in the other hand was in favor of
beefing up the Index functions and of adding children I/O packages.

****************************************************************

From: Randy Brukardt
Sent: Tuesday, August 27, 2002  9:21 AM

> It's an entirely different thing to add a new overload with parameter/result
> types that are likely to be used in conjunction with unbounded strings.

OK, but in that case Robert is right - we can't add anything to any of the
string packages (which includes new Index routines). And I don't think any of
this is compelling enough to add child packages.

> Interestingly enough, I don't remember agreeing to changes regarding slices
> during the last meeting.  That may have been overheating, but then I see in
> the minutes that "there is not much interest in the slice version of the
> operations that were proposed".  The ARG in the other hand was in favor of
> beefing up the Index functions and of adding children I/O packages.

"Slice version" is the key words. The original proposal had slice versions of
the routines like Delete and Insert, as well as ones taking Unbounded_String. I
don't believe that there was much discussion on the orthogonolity routines
(taking Unbounded_String parameters), but I'm sure that we didn't decide to
drop them. The "Slice" function is one of the things that should return an
Unbounded_String. (Note that in my opinion, this routine never should have
returned String in the first place; the problem we have now is because of that
mistake.)

Anyway, without fixing the orthogonality, I don't see much benefit to this AI
at all (especially since the Index changes also cannot be allowed); I'd vote it
No Action in that case.

****************************************************************

From: Nick Roberts
Sent: Tuesday, August 27, 2002  10:30 AM

This may be a point that has been mentioned before, but I feel it
ought to be mooted (again).

I believe the primary rationale for introducing the
Ada.Strings.Bounded and Ada.Strings.Unbounded packages into the
standard was based on the assumption that compiler implementors
could implement these packages more efficiently (using 'insider
knowledge' and/or special machine code) than would be possible by
writing them in 'pure' (portable) Ada.

I suspect this rationale could be challenged (especially in the
context of RISC targets). I am myself dubious about it. Were
there other strong reasons for these packages being part of the
standard?

Would there be some sense in the idea of actually removing these
packages from the next revision of the standard? Obviously they
would have to remain available in practice, but their ongoing
specification (and maybe testing) could fall under the
jurisdiction of something separate from the (main) Ada language
standard itself.

Given the difficulties, in terms of time and manpower, the ARG
has in developing the standard (with all respect), would this
perhaps be a more pragmatic approach? I believe there are some
who feel this is the way for a 'containers' facility to be added
to Ada.

I personally don't have strong feelings one way or the other, but
it's perhaps something to consider.

We are no doubt all agreed that 'versionism' is evil. However, it
may be the lesser of two (or the least of several) evils to
specify the string packages in two versions -- the original (as
it is now) and the new (as improved by the proposal) -- and
permit implementations to support either, or both (selected by
some switch, option, or pragma perhaps), or even neither. In this
case, it may be easier if they are no longer specified in the
main Ada standard. Again, this is just an idea to consider.

If I'm retreading already trodden ground, my apologies.

****************************************************************

From: Bob Duff
Sent: Tuesday, August 27, 2002  4:46 PM

> I believe the primary rationale for introducing the
> Ada.Strings.Bounded and Ada.Strings.Unbounded packages into the
> standard was based on the assumption that compiler implementors
> could implement these packages more efficiently (using 'insider
> knowledge' and/or special machine code) than would be possible by
> writing them in 'pure' (portable) Ada.

I don't think that was the main reason.  I think these packages are
included because they provide generally-useful functionality that would
be useful to make portable.  In part, they are an answer to the
complaint that the predefined String can't do X, Y, and Z, whereas in
language Mumble, that functionality is standardly available.

> Would there be some sense in the idea of actually removing these
> packages from the next revision of the standard? Obviously they
> would have to remain available in practice, but their ongoing
> specification (and maybe testing) could fall under the
> jurisdiction of something separate from the (main) Ada language
> standard itself.

I don't like the idea of removing them.

Of course, anybody can create a better version of these packages, if
they like.

****************************************************************

From: Robert Dewar
Sent: Tuesday, August 27, 2002  4:59 PM

It is compeltely unacceptable to even consider removing useful functionality
from the standard. This package Strings. Unbounded is in wide use and it would
be unthinkable to remove it from the standard. It would give an impression of
a standards process that had run amok!

****************************************************************

From: Robert Dewar
Sent: Tuesday, August 27, 2002  4:28 PM

I find the idea of a child I/O package reasonable. GNAT has provided that
for some time. I assume the GNAT spec is in hand in this discussion?

****************************************************************


Questions? Ask the ACAA Technical Agent