Version 1.12 of ais/ai-00301.txt

Unformatted version of ais/ai-00301.txt version 1.12
Other versions for file ais/ai-00301.txt

!standard A.4.3(8)          04-05-25 AI95-00301/08
!standard A.4.3(56)
!standard A.4.3(58)
!standard A.4.3(60)
!standard A.4.4(12)
!standard A.4.4(28)
!standard A.4.4(43)
!standard A.4.4(92)
!standard A.4.4(101/1)
!standard A.4.5(11)
!standard A.4.5(22)
!standard A.4.5(38)
!standard A.4.5(79)
!standard A.4.5(82)
!standard A.10.1(48)
!standard A.10.7(17)
!standard A.10.11(00)
!standard A.11(03)
!class amendment 02-06-12
!status Amendment 200Y 04-01-13
!status WG9 Approved 04-06-18
!status ARG Approved 9-0-2 03-12-12
!status work item 02-06-12
!status received 02-06-12
!priority Medium
!difficulty Easy
!subject Operations on language-defined string types
!summary
Additional Index functions are added to Ada.Strings.Fixed, Ada.Strings.Bounded, and Ada.Strings.Unbounded with From parameters so that partial string searches can be made. Additional subprograms are added to Ada.Strings.Unbounded and Ada.Strings.Bounded to avoid unecessary conversions to and from other types. A Get_Line function is added to Text_IO. I/O operations on unbounded strings are provided in a new child package of Ada.Text_IO.
!problem
The package Ada.Strings.Unbounded contains an incomplete set of facilities. Many of these omissions mean that it is necessary to convert back and forth to String (thus leaving the Unbounded_String abstraction) in order to handle them. In some cases, the operations have to be done with Ada.Strings.Fixed.
For instance, a commonly encountered problem is the need to find all of the occurrences of a pattern in a string. The Index function always searches the entire string. This is not a major problem for a String, as searching the appropriate slice is easy. However, for an Unbounded_String, the only practical solution is to convert the Unbounded_String to a String, and using Ada.Strings.Fixed to do the operation.
Many operations in Ada.Strings.Unbounded take only String parameters or results. For instance, the Insert, Overwrite, and Replace_Slice routines do not include a version in which the new portion is an Unbounded_String. Similarly, Slice returns only a String. Thus, it is necessary to leave the Unbounded_String abstraction (by converting to String) in order to use these operations. This extra conversion is very verbose.
Another problem is that Unbounded_Strings typically are implemented as a controlled type. That means that an assignment of an Unbounded_String has substantial overhead in the form of calls to Finalize and Adjust, and probably includes allocation of memory. Unbounded_Strings are often given values with the function To_Unbounded_String. However, when this function is used in an assignment statement, memory may be allocated twice (once by the function, and once by Adjust), which is substantial extra overhead. A procedure version of To_Unbounded_String would avoid this problem.
And finally, I/O operations with Unbounded_Strings as parameters are not provided. These are commonly needed, and (in the case of Get_Line) are not trivial to write. A child package similar to the ones provided for Complex types is valuable. Function Get_Line is added to Text_IO as well for consistency.
!proposal
Additional subprograms and a package are added as follows.
Three functions Index and a function Index_Non_Blank with an additional parameter From are added to Strings.Fixed, Strings.Bounded and Strings.Unbounded.
A procedure Set_Bounded_String and procedure and function Bounded_Slice are added to Strings.Bounded. A procedure Set_Unbounded_String and procedure and function Unbounded_Slice are similarly added to Strings.Unbounded.
Two functions Get_Line are added to Text_IO.
A package Text_IO.Unbounded_IO is added.
!wording
Add after A.4.3(8):
function Index (Source : in String; Pattern : in String; From : in Positive; Going : in Direction := Forward; Mapping : in Maps.Character_Mapping := Maps.Identity) return Natural;
function Index (Source : in String; Pattern : in String; From : in Positive; Going : in Direction := Forward; Mapping : in Maps.Character_Mapping_Function) return Natural;
function Index (Source : in String; Set : in Maps.Character_Set; From : in Positive; Test : in Membership := Inside; Going : in Direction := Forward) return Natural;
function Index_Non_Blank (Source : in String; From : in Positive; Going : in Direction := Forward) return Natural;
Add after A.4.3(56):
function Index (Source : in String; Pattern : in String; From : in Positive; Going : in Direction := Forward; Mapping : in Maps.Character_Mapping := Maps.Identity) return Natural;
function Index (Source : in String; Pattern : in String; From : in Positive; Going : in Direction := Forward; Mapping : in Maps.Character_Mapping_Function) return Natural;
Each Index function searches, starting from From, for a slice of Source, with length Pattern'Length, that matches Pattern with respect to Mapping; the parameter Going indicates the direction of the lookup. If Going = Forward, then Index returns the smallest index I which is greater than or equal to From such that the slice of Source starting at I matches Pattern. If Going = Backward, then Index returns the largest index I such that the slice of Source starting at I matches Pattern and has an upper bound less than or equal to From. If there is no such slice, then 0 is returned. If Pattern is the null string then Pattern_Error is propagated.
AARM Note: There is no default parameter for From; the default value would need to depend on other parameters (the bounds of Source and the direction Going). It is better to use overloaded functions rather than a special value to represent the default.
(Also, move AARM A.4.3(58.a) here.)
Replace A.4.3(58) by:
If Going = Forward, returns
Index (Source, Pattern, Source'First, Forward, Mapping);
otherwise returns
Index (Source, Pattern, Source'Last, Backward, Mapping);
function Index (Source : in String; Set : in Maps.Character_Set; From : in Positive; Test : in Membership := Inside; Going : in Direction := Forward) return Natural;
Index searches for the first or last occurrence of any of a set of characters (when Test=Inside), or any of the complement of a set of characters (when Test=Outside). It returns the smallest index I >= From (if Going=Forward) or the largest index I <= From (if Going=Backward) such that Source(I) satisfies the Test condition with respect to Set; it returns 0 if there is no such character in Source.
Replace A.4.3(60) by:
If Going = Forward, returns
Index (Source, Set, Source'First, Test, Forward);
otherwise returns
Index (Source, Set, Source'Last, Test, Backward);
function Index_Non_Blank (Source : in String; From : in Positive; Going : in Direction := Forward) return Natural;
Returns Index (Source, Maps.To_Set(Space), From, Outside, Going);
Add after A.4.4(12):
procedure Set_Bounded_String
(Target : out Bounded_String;
Source : in String); Drop : in Truncation := Error);
Add after A.4.4(28):
function Bounded_Slice
(Source : in Bounded_String;
Low : in Positive; High : in Natural; Drop : in Truncation := Error) return Bounded_String;
procedure Bounded_Slice
(Source : in Bounded_String;
Target : out Bounded_String; Low : in Positive; High : in Natural; Drop : in Truncation := Error)
Replace A.4.4(43) with all of th following:
-- Search subprograms:
function Index (Source : in Bounded_String; Pattern : in String; From : in Positive; Going : in Direction := Forward; Mapping : in Maps.Character_Mapping := Maps.Identity) return Natural;
function Index (Source : in Bounded_String; Pattern : in String; From : in Positive; Going : in Direction := Forward; Mapping : in Maps.Character_Mapping_Function) return Natural;
function Index (Source : in Bounded_String; Set : in Maps.Character_Set; From : in Positive; Test : in Membership := Inside; Going : in Direction := Forward) return Natural;
function Index_Non_Blank (Source : in Bounded_String; From : in Positive; Going : in Direction := Forward) return Natural;
Add after A.4.4(92):
procedure Set_Bounded_String
(Target : out Bounded_String;
Source : in String; Drop : in Truncation := Error);
Equivalent to Target := To_Bounded_String (Source, Drop);
Add after A.4.4(101/1):
function Bounded_Slice
(Source : in Bounded_String;
Low : in Positive; High : in Natural; Drop : in Truncation := Error) return Bounded_String;
Returns the slice at positions Low through High in the string represented by Source as a bounded string; propagates Index_Error if Low > Length(Source)+1 or High > Length(Source).
procedure Bounded_Slice
(Source : in Bounded_String;
Target : out Bounded_String; Low : in Positive; High : in Natural; Drop : in Truncation := Error);
Equivalent to Target := Bounded_Slice (Source, Low, High, Drop);
Add after A.4.5(11):
procedure Set_Unbounded_String
(Target : out Unbounded_String;
Source : in String; Drop : in Truncation := Error);
Add after A.4.5(22):
function Unbounded_Slice
(Source : in Unbounded_String;
Low : in Positive; High : in Natural; Drop : in Truncation := Error) return Unbounded_String;
procedure Unbounded_Slice
(Source : in Unbounded_String;
Target : out Unbounded_String; Low : in Positive; High : in Natural; Drop : in Truncation := Error);
Add after A.4.5(38):
function Index (Source : in Unbounded_String; Pattern : in String; From : in Positive; Going : in Direction := Forward; Mapping : in Maps.Character_Mapping := Maps.Identity) return Natural;
function Index (Source : in Unbounded_String; Pattern : in String; From : in Positive; Going : in Direction := Forward; Mapping : in Maps.Character_Mapping_Function) return Natural;
function Index (Source : in Unbounded_String; Set : in Maps.Character_Set; From : in Positive; Test : in Membership := Inside; Going : in Direction := Forward) return Natural;
function Index_Non_Blank (Source : in Unbounded_String; From : in Positive; Going : in Direction := Forward) return Natural;
Add after A.4.5(79):
The procedure Set_Unbounded_String sets Target to an Unbounded_String that represents Source.
Add after A.4.5(82):
The function Unbounded_Slice returns the slice at positions Low through High in the string represented by Source as an Unbounded_String. The procedure Unbounded_Slice sets Target to the Unbounded_String representing the slice at positions Low through High in the string represented by Source. Both routines propagate Index_Error if Low > Length(Source)+1 or High > Length(Source).
Add after A.10.1(48):
function Get_Line(File : in File_Type) return String; function Get_Line return String;
Add after A.10.7(17):
function Get_Line(File : in File_Type) return String; function Get_Line return String;
Returns a result string constructed by reading successive characters from the specified input file, and assigning them to successive characters of the result string. The result string has a lower bound of 1 and an upper bound of the number of characters read. Reading stops when the end of the line is met; Skip_Line is then (in effect) called with a spacing of 1.
The exception End_Error is propagated if an attempt is made to skip a file terminator.
Add a new clause:
A.10.11 Input-output for Unbounded Strings
The package Text_IO.Unbounded_IO provides input-output in human-readable form for Unbounded_Strings.
Static Semantics
The library package Text_IO.Unbounded_IO has the following declaration:
with Ada.Strings.Unbounded; package Ada.Text_IO.Unbounded_IO is
procedure Put (File : in File_Type; Item : in Strings.Unbounded.Unbounded_String);
procedure Put (Item : in Strings.Unbounded.Unbounded_String);
procedure Put_Line (File : in Text_IO.File_Type; Item : in Strings.Unbounded.Unbounded_String);
procedure Put_Line (Item : in Strings.Unbounded.Unbounded_String);
function Get_Line (File : in File_Type) return Strings.Unbounded.Unbounded_String;
function Get_Line return Strings.Unbounded.Unbounded_String;
procedure Get_Line (File : in File_Type; Item : out Strings.Unbounded.Unbounded_String);
procedure Get_Line (Item : out Strings.Unbounded.Unbounded_String);
end Ada.Text_IO.Unbounded_IO;
For an item of type Unbounded_String, the following subprograms are provided:
procedure Put
(File : in File_Type;
Item : in Strings.Unbounded.Unbounded_String);
Equivalent to Text_IO.Put (File, Strings.Unbounded.To_String(Item));
procedure Put
(Item : in Strings.Unbounded.Unbounded_String);
Equivalent to Text_IO.Put (Strings.Unbounded.To_String(Item));
procedure Put_Line
(File : in Text_IO.File_Type;
Item : in Strings.Unbounded.Unbounded_String);
Equivalent to Text_IO.Put_Line (File, Strings.Unbounded.To_String(Item));
procedure Put_Line
(Item : in Strings.Unbounded.Unbounded_String);
Equivalent to Text_IO.Put_Line (Strings.Unbounded.To_String(Item));
function Get_Line
(File : in File_Type)
return Strings.Unbounded.Unbounded_String;
Returns Strings.Unbounded.To_Unbounded_String(Text_IO.Get_Line(File));
function Get_Line
return Strings.Unbounded.Unbounded_String;
Returns Strings.Unbounded.To_Unbounded_String(Text_IO.Get_Line);
procedure Get_Line
(File : in File_Type; Item : out Strings.Unbounded.Unbounded_String);
Equivalent to Item := Get_Line (File);
procedure Get_Line
(Item : out Strings.Unbounded.Unbounded_String);
Equivalent to Item := Get_Line;
Add after A.11(3):
The specification of package Wide_Text_IO.Wide_Unbounded_IO is the same as that for Text_IO.Unbounded_IO, except that any occurrence of Unbounded_String is replaced by Wide_Unbounded_String, and any occurrence of package Unbounded is replaced by Wide_Unbounded.
!discussion
Ada.Strings.Unbounded was not really intended to be a complete abstraction. Rather, it was designed to provide a convenient storage representation mechanism for strings. Even so, it is a bad idea to have to use multiple string packages in a single expression, and many common operations require that.
The new operations added to Ada.Strings.Fixed, Ada.Strings.Bounded, and Ada.Strings.Unbounded could potentially cause new ambiguities in programs if there is a use clause for the string package. However, this is unlikely (as the new routines have relatively lengthy names), and no programs change meaning (any incompatibilities cause compile-time errors).
We considered adding these new routines as child packages in order to eliminate compatibility problems. However, these packages would have looked like an ugly add-on. And, choosing a name is difficult. Some of the names proposed include Ada.Strings.Unbounded.Additional_Operations, Ada.Strings.Unbounded.Enhancements, Ada.Strings.Unbounded.More, Ada.Strings.Unbounded.Unbounded_Operations, or Ada.Strings.Unbounded.Stuff_that_should_have_been_here_all_along. (OK, the last is a joke.) Thus, we did not follow this alternative.
An earlier proposal for the Index functions was to add a defaulted From parameter to the end of the existing routine. This would look like:
function Index (Source : in String; Pattern : in String; Going : in Direction := Forward; Mapping : in Maps.Character_Mapping := Maps.Identity; From : in Positive := 1) return Natural;
This has the advantage of not adding new routines. However, there are several serious drawbacks. First, this is significantly more incompatible than just adding new routines. Adding a parameter like this affects renamings and overridings. One compiler vendor notes that they've seen derivations of Unbounded_Strings in their customer's code.
Second, the correct default value is not clear. For Ada.Strings.Fixed, it depends on the bounds of Source and Going. Since a default value cannot depend on the values of other parameters, it is not possible to write the correct default value. For Ada.Strings.Bounded and Ada.Strings.Unbounded, the value depends on Going and Length(Source), which again is a problem. We could define the parameter to have type Natural, and define 0 to mean the start or end of Source as needed, but that seems very ugly and confusing.
Thus, we added new routines with the From parameter as the last non-defaulted parameter.
Ada.Text_IO.Unbounded_IO.Get_Line can be implemented as follows:
procedure Get_Line (File : in File_Type; Item : out Ada.Strings.Unbounded.Unbounded_String) is Buffer : String (1 .. 80); Last : Natural; begin Get_Line (File, Buffer, Last); Ada.Strings.Unbounded.To_Unbounded_String (Item, Buffer(1..Last)); while Last = Buffer'Last loop Get_Line (File, Buffer, Last); Ada.Strings.Unbounded.Append (Item, Buffer(1..Last)); end loop; end Get_Line;
However, all of the I/O operations can be better implemented if they have access to the internal representation of the Unbounded_String type. That can be accomplished with an implementation-defined child package of Ada.Strings.Unbounded.
Ada.Text_IO.Unbounded_IO is defined as a child of Text_IO to match the already existing Ada.Text_IO.Complex_IO. This makes the definition of a similar package for Bounded_Strings complex. It could be defined something like:
with Ada.Strings.Bounded.Generic_Bounded_Length; generic with package Bounded is new Ada.Strings.Bounded.Generic_Bounded_Length (<>); package Ada.Text_IO.Bounded_IO is
procedure Put (File : in File_Type; Item : in Bounded.Bounded_String);
...
end Ada.Text_IO.Bounded_IO;
but this didn't seem to have sufficient benefit to mandate. (An implementation can add this package if it likes.) Note that we're never going to be totally consistent here, as Ada.Text_IO.Fixed_IO already exists for another purpose, and such a package would not make any sense anyway, as Text_IO already includes the needed operations.
!example
Consider the task of finding all occurrences of a given string Pattern in a given Unbounded_String Source. Currently, one either has to convert the whole unbounded string into a string and search on that, or implement the search oneself (ignoring character mappings here):
declare Src_Length : constant Natural := Length (Source); From : Positive := 1; begin while From + Pattern'Length - 1 <= Src_Length loop if Slice (Source, From, From + Pattern'Length - 1) = Pattern then -- Found an occurrence of Pattern; process it, then advance From From := From + Pattern'Length; -- No overlapping occurrences! else From := From + 1; end if; end loop; end;
This incurs quite a lot of extra storage overhead because the (necessary!) call to Slice is required to return a copy of the slice of Source.Reference. With the proposed new Index functions, this could be simplified to:
declare Src_Length : constant Natural := Length (Source); From : Positive := 1; Idx : Natural; begin while From + Pattern'Length - 1 <= Src_Length loop Idx := Ada.Strings.Unbounded.Index (Source, Pattern, From); exit when Idx = 0; -- Found an occurrence of Pattern starting at Idx. -- Process this occurrence, then advance From From := Idx + Pattern'Length; -- No overlapping occurrences! end loop; end;
which may be much more efficient because there are no intermediary copies to String involved.
The same is true for writing an Unbounded_String. Currently, one has to do
Ada.Text_IO.Put_Line (Ada.Strings.Unbounded.To_String (My_Unbounded_String));
which also incurs this extra intermediary String representation, which could be avoided with the new operation
Ada.Strings.Unbounded.Text_IO (My_Unbounded_String);
!corrigendum A.4.3(8)
Insert after the paragraph:
-- Search subprograms
the new paragraphs:
function Index (Source : in String; Pattern : in String; From : in Positive; Going : in Direction := Forward; Mapping : in Maps.Character_Mapping := Maps.Identity) return Natural;
function Index (Source : in String; Pattern : in String; From : in Positive; Going : in Direction := Forward; Mapping : in Maps.Character_Mapping_Function) return Natural;
function Index (Source : in String; Set : in Maps.Character_Set; From : in Positive; Test : in Membership := Inside; Going : in Direction := Forward) return Natural;
function Index_Non_Blank (Source : in String; From : in Positive; Going : in Direction := Forward) return Natural;
!corrigendum A.4.3(56)
Insert after the paragraph:
the new paragraphs:
function Index (Source : in String; Pattern : in String; From : in Positive; Going : in Direction := Forward; Mapping : in Maps.Character_Mapping := Maps.Identity) return Natural;
function Index (Source : in String; Pattern : in String; From : in Positive; Going : in Direction := Forward; Mapping : in Maps.Character_Mapping_Function) return Natural;
Each Index function searches, starting from From, for a slice of Source, with length Pattern'Length, that matches Pattern with respect to Mapping; the parameter Going indicates the direction of the lookup. If Going = Forward, then Index returns the smallest index I which is greater than or equal to From such that the slice of Source starting at I matches Pattern. If Going = Backward, then Index returns the largest index I such that the slice of Source starting at I matches Pattern and has an upper bound less than or equal to From. If there is no such slice, then 0 is returned. If Pattern is the null string then Pattern_Error is propagated.
!corrigendum A.4.3(58)
Replace the paragraph:
Each Index function searches for a slice of Source, with length Pattern'Length, that matches Pattern with respect to Mapping; the parameter Going indicates the direction of the lookup. If Going = Forward, then Index returns the smallest index I such that the slice of Source starting at I matches Pattern. If Going = Backward, then Index returns the largest index I such that the slice of Source starting at I matches Pattern. If there is no such slice, then 0 is returned. If Pattern is the null string then Pattern_Error is propagated.
by:
If Going = Forward, returns
Index (Source, Pattern, Source'First, Forward, Mapping);
otherwise returns
Index (Source, Pattern, Source'Last, Backward, Mapping);
function Index (Source : in String; Set : in Maps.Character_Set; From : in Positive; Test : in Membership := Inside; Going : in Direction := Forward) return Natural;
Index searches for the first or last occurrence of any of a set of characters (when Test=Inside), or any of the complement of a set of characters (when Test=Outside). It returns the smallest index I >= From (if Going=Forward) or the largest index I <= From (if Going=Backward) such that Source(I) satisfies the Test condition with respect to Set; it returns 0 if there is no such Character in Source.
!corrigendum A.4.3(60)
Replace the paragraph:
Index searches for the first or last occurrence of any of a set of characters (when Test=Inside), or any of the complement of a set of characters (when Test=Outside). It returns the smallest index I (if Going=Forward) or the largest index I (if Going=Backward) such that Source(I) satisfies the Test condition with respect to Set; it returns 0 if there is no such Character in Source.
by:
If Going = Forward, returns
Index (Source, Set, Source'First, Test, Forward);
otherwise returns
Index (Source, Set, Source'Last, Test, Backward);
function Index_Non_Blank (Source : in String; From : in Positive; Going : in Direction := Forward) return Natural;
Returns Index (Source, Maps.To_Set(Space), From, Outside, Going);
!corrigendum A.4.4(12)
Insert after the paragraph:
function To_String (Source : in Bounded_String) return String;
the new paragraphs:
procedure Set_Bounded_String (Target : out Bounded_String; Source : in String; Drop : in Truncation := Error);
!corrigendum A.4.4(28)
Insert after the paragraph:
function Slice (Source : in Bounded_String; Low : in Positive; High : in Natural) return String;
the new paragraphs:
function Bounded_Slice (Source : in Bounded_String; Low : in Positive; High : in Natural; Drop : in Truncation := Error) return Bounded_String;
procedure Bounded_Slice (Source : in Bounded_String; Target : out Bounded_String; Low : in Positive; High : in Natural; Drop : in Truncation := Error);
!corrigendum A.4.4(43)
Replace the paragraph:
-- Search functions
by:
-- Search subprograms
function Index (Source : in Bounded_String; Pattern : in String; From : in Positive; Going : in Direction := Forward; Mapping : in Maps.Character_Mapping := Maps.Identity) return Natural;
function Index (Source : in Bounded_String; Pattern : in String; From : in Positive; Going : in Direction := Forward; Mapping : in Maps.Character_Mapping_Function) return Natural;
function Index (Source : in Bounded_String; Set : in Maps.Character_Set; From : in Positive; Test : in Membership := Inside; Going : in Direction := Forward) return Natural;
function Index_Non_Blank (Source : in Bounded_String; From : in Positive; Going : in Direction := Forward) return Natural;
!corrigendum A.4.4(92)
Insert after the paragraph:
To_String returns the String value with lower bound 1 represented by Source. If B is a Bounded_String, then B = To_Bounded_String(To_String(B)).
the new paragraphs:
procedure Set_Bounded_String (Target : out Bounded_String; Source : in String); Drop : in Truncation := Error);
Equivalent to Target := To_Bounded_String (Source, Drop);
!corrigendum A.4.4(101/1)
Insert after the paragraph:
Returns the slice at positions Low through High in the string represented by Source; propagates Index_Error if Low > Length(Source)+1 or High > Length(Source).
the new paragraphs:
function Bounded_Slice (Source : in Bounded_String; Low : in Positive; High : in Natural; Drop : in Truncation := Error) return Bounded_String;
Returns the slice at positions Low through High in the string represented by Source as a bounded string; propagates Index_Error if Low > Length(Source)+1 or High > Length(Source).
procedure Bounded_Slice (Source : in Bounded_String; Target : out Bounded_String; Low : in Positive; High : in Natural; Drop : in Truncation := Error);
Equivalent to Target := Bounded_Slice (Source, Low, High, Drop);
!corrigendum A.4.5(11)
Insert after the paragraph:
function To_String (Source : in Unbounded_String) return String;
the new paragraphs:
procedure Set_Unbounded_String (Target : out Unbounded_String; Source : in String; Drop : in Truncation := Error);
!corrigendum A.4.5(22)
Insert after the paragraph:
function Slice (Source : in Unbounded_String; Low : in Positive; High : in Natural) return String;
the new paragraphs:
function Unbounded_Slice (Source : in Unbounded_String; Low : in Positive; High : in Natural; Drop : in Truncation := Error) return Unbounded_String;
procedure Unbounded_Slice (Source : in Unbounded_String; Target : out Unbounded_String; Low : in Positive; High : in Natural; Drop : in Truncation := Error);
!corrigendum A.4.5(38)
Insert after the paragraph:
-- Search subprograms
the new paragraphs:
function Index (Source : in Unbounded_String; Pattern : in String; From : in Positive; Going : in Direction := Forward; Mapping : in Maps.Character_Mapping := Maps.Identity) return Natural;
function Index (Source : in Unbounded_String; Pattern : in String; From : in Positive; Going : in Direction := Forward; Mapping : in Maps.Character_Mapping_Function) return Natural;
function Index (Source : in Unbounded_String; Set : in Maps.Character_Set; From : in Positive; Test : in Membership := Inside; Going : in Direction := Forward) return Natural;
function Index_Non_Blank (Source : in Unbounded_String; From : in Positive; Going : in Direction := Forward) return Natural;
!corrigendum A.4.5(79)
Insert after the paragraph:
the new paragraph:
The procedure Set_Unbounded_String sets Target to an Unbounded_String that represents Source.
!corrigendum A.4.5(82)
Insert after the paragraph:
The Element, Replace_Element, and Slice subprograms have the same effect as the corresponding bounded-length string subprograms.
the new paragraph:
The function Unbounded_Slice returns the slice at positions Low through High in the string represented by Source as an Unbounded_String. The procedure Unbounded_Slice sets Target to the Unbounded_String representing the slice at positions Low through High in the string represented by Source. Both routines propagate Index_Error if Low > Length(Source)+1 or High > Length(Source).
!corrigendum A.10.1(48)
Insert after the paragraph:
procedure Put(File : in File_Type; Item : in String); procedure Put(Item : in String);
the new paragraphs:
function Get_Line(File : in File_Type) return String; function Get_Line return String;
!corrigendum A.10.7(13)
Replace the paragraph:
For an item of type String, the following procedures are provided:
by:
For an item of type String, the following subprograms are provided:
!corrigendum A.10.7(17)
Insert after the paragraph:
Determines the length of the given string and attempts that number of Put operations for successive characters of the string (in particular, no operation is performed if the string is null).
the new paragraphs:
function Get_Line(File : in File_Type) return String; function Get_Line return String;
Returns a result string constructed by reading successive characters from the specified input file, and assigning them to successive characters of the result string. The result string has a lower bound of 1 and an upper bound of the number of characters read. Reading stops when the end of the line is met; Skip_Line is then (in effect) called with a spacing of 1.
The exception End_Error is propagated if an attempt is made to skip a file terminator.
!corrigendum A.10.11(01)
Insert new clause:
The package Text_IO.Unbounded_IO provides input-output in human-readable form for Unbounded_Strings.
Static Semantics
The library package Text_IO.Unbounded_IO has the following declaration:
with Ada.Strings.Unbounded; package Ada.Text_IO.Unbounded_IO is
procedure Put (File : in File_Type; Item : in Strings.Unbounded.Unbounded_String);
procedure Put (Item : in Strings.Unbounded.Unbounded_String);
procedure Put_Line (File : in Text_IO.File_Type; Item : in Strings.Unbounded.Unbounded_String);
procedure Put_Line (Item : in Strings.Unbounded.Unbounded_String);
function Get_Line (File : in File_Type) return Strings.Unbounded.Unbounded_String;
function Get_Line return Strings.Unbounded.Unbounded_String;
procedure Get_Line (File : in File_Type; Item : out Strings.Unbounded.Unbounded_String);
procedure Get_Line (Item : out Strings.Unbounded.Unbounded_String);
end Ada.Text_IO.Unbounded_IO;
For an item of type Unbounded_String, the following subprograms are provided:
procedure Put (File : in File_Type; Item : in Strings.Unbounded.Unbounded_String);
Equivalent to Text_IO.Put (File, Strings.Unbounded.To_String(Item));
procedure Put (Item : in Strings.Unbounded.Unbounded_String);
Equivalent to Text_IO.Put (Strings.Unbounded.To_String(Item));
procedure Put_Line (File : in Text_IO.File_Type; Item : in Strings.Unbounded.Unbounded_String);
Equivalent to Text_IO.Put_Line (File, Strings.Unbounded.To_String(Item));
procedure Put_Line (Item : in Strings.Unbounded.Unbounded_String);
Equivalent to Text_IO.Put_Line (Strings.Unbounded.To_String(Item));
function Get_Line (File : in File_Type) return Strings.Unbounded.Unbounded_String;
Returns Strings.Unbounded.To_Unbounded_String(Text_IO.Get_Line(File));
function Get_Line return Strings.Unbounded.Unbounded_String;
Returns Strings.Unbounded.To_Unbounded_String(Text_IO.Get_Line);
procedure Get_Line (File : in File_Type; Item : out Strings.Unbounded.Unbounded_String);
Equivalent to Item := Get_Line (File);
procedure Get_Line (Item : out Strings.Unbounded.Unbounded_String);
Equivalent to Item := Get_Line;
!corrigendum A.11(3)
Insert after the paragraph:
Nongeneric equivalents of Wide_Text_IO.Integer_IO and Wide_Text_IO.Float_IO are provided (as for Text_IO) for each predefined numeric type, with names such as Ada.Integer_Wide_Text_IO, Ada.Long_Integer_Wide_Text_IO, Ada.Float_Wide_Text_IO, Ada.Long_Float_Wide_Text_IO.
the new paragraph:
The specification of package Wide_Text_IO.Wide_Unbounded_IO is the same as that for Text_IO.Unbounded_IO, except that any occurrence of Unbounded_String is replaced by Wide_Unbounded_String, and any occurrence of package Unbounded is replaced by Wide_Unbounded.
!ACATS test
Tests should be created to check on the implementation of this feature.
!appendix

Editor's note: The original proposal was submitted by Thomas Wolf on
Wednesday, June 12, 2002. It was edited slightly, but otherwise used intact
as the initial draft of this AI.

****************************************************************

From: Randy Brukardt
Sent: Wednesday, June 12, 2002  10:14 PM

Thanks for the submission.

I've recently had similar problems using Unbounded_Strings. The fact that
you have to frequently convert to strings to get anything done is rather
annoying. I'm surprised I haven't heard about it before. (I personally
hadn't tried to use Unbounded_Strings before the recent program, a spam
scanner.)

Some comments:

I ran into a significant performance problem when I needed to test whether
the start of an unbounded string matched some other string. The simple
solution of
   if Index (Source, To_String(Pattern)) = 1 then
turned out to be horribly inefficient.

The better solution of
   if Slice (Source, 1, Length(Pattern)) = To_String(Pattern) then
is wrong if Source is too short (Constraint_Error is raised).

The correct solution of
   if Length (Source) >= Length (Pattern) and then
      Slice (Source, 1, Length(Pattern)) = To_String(Pattern) then
also is rather inefficient, because it has 5 function calls and two
unconstrained string returning function calls, which need to use heap or
other expensive memory allocation, and of course, two extra copies.

Being a compiler writer, I ended up declaring
    Is_Prefix (Source, Pattern : in Unbounded_String)
in a child package, and implemented it myself. But I don't recommend that
solution to everyone.

I also ran into the missing Index from the middle issue (which comes up when
you want to process all of the matches for the pattern), which I solved by
punting: convert to a String and use Ada.Strings.Fixed.Index.

> Also, while there is a function To_Unbounded_String, there is
> no procedural variant for this operation. However, such a
> procedure would be very useful, for it might be more efficient
> than the function.

Arguably, a procedural variant of To_String also would be useful, and it
certainly would be more efficient. (Indeed, the To_Unbounded_String
function, which returns a bounded record type, is usually cheap in most
implementations.)

> Implementation advice: all these new operations should be implemented such
> that no extra (intermediary) copy of the string (slice) data is created.

I'm not sure there is much point to this. The existing implementations of
Ada.Strings.Unbounded vary wildly in efficiency, I don't see any reason to
try to say anything more about any new functions.

I realize that you are trying to prevent people from implementing these
simply by calling Slice or To_String, but an implementor that would do that
probably has done it in many other parts of Ada.Strings.Unbounded.

...

>   function Get_Line
>     (File : in Ada.Text_IO.File_Type)
>     return Unbounded_String;
>   --  Reads up to the end of the line, returning the whole line as an
>   --  unbounded string. If a line terminator is met, Skip_Line is (in effect)
>   --  called with a spacing of 1, and the characters up to but not including
>   --  the line terminator are returned. If a file terminator is met, the
>   --  characters up to but not including the file terminator are returned,
>   --  except if the file terminator is met before any characters have been
>   --  read; in this case, Ada.IO_Exceptions.End_Error is raised.

There are two problems with this. First, you've left out the part of about
what happens if the string is full. This is a real issue on compilers where
Integer is 16-bit. The standard has to cover all of the cases, even if they
are unlikely on most implementations.

Second, there is no good reason to change the semantics of Get_Line at a
file terminator. But this is based on a serious misconception about Ada (see
below). And in any case, Get_Line should operate like the Get_Line we all
expect.

> Implementation advice: the Put and Put_Line operations should be implemented
> such that no extra copy of the string data occurs.

This one if even more dubious than the first one. It's likely that the
implementation has to copy the string data for the existing Put and
Put_Line, to put it into a buffer or to pass it to the operating system.
You're saying that it can't do that? I think every implementation would have
to violate this advice, making it useless. I realize your point is that you
want the string directly written out, but again I think you have to leave
this to the implementor.


> BTW: since Ada.Text_IO.End_Of_Line returns True when a line *or a file*
> terminator are next in the input (A.10.5(13)), I believe it would make sense
> to clearly define that Ada.Text_IO.Get_Line does *not* raise End_Error upon
> the last line of a file if that last line has no line terminator, but ends
> on EOF directly. Otherwise, it may get rather difficult to read the last
> line of such a file. In other words, define "end of the line" as "line
> terminator or EOF", and Get_Line to read up to the end of the line (and
> call Skip_Line (1) if it stopped on a line terminator, but not if it
> stopped on EOF).

Why do you think this is not true? There is always an line terminator (and
page terminator) before the file terminator, see A.10(7). If a file does not
explicitly have a line terminator at the end of the file, the implementation
has to implicitly provide one. That's been true since Ada 80.


> It should also be evaluated whether similar additions as proposed above
> for bounded strings (Ada.Strings.Bounded) would make sense.

Deleting bounded strings would make more sense. :-)


In your discussion section, you probably should mention that being forced to
leave the unbounded string abstraction to do common operations substantially
weakens the abstraction. If you have to do it often enough, you begin to
wonder what exactly you are gaining by using Unbounded_String. (I doubt that
I will use Unbounded_String again, because it was as messy or messier than
simply using good old regular strings - in large part because I continually
had to convert back to String.)

****************************************************************

From: Ted Baker
Sent: Thursday, June 13, 2002  5:55 AM

I would second Randy's comments on Unbounded_String.  Back when we
were teaching Ada here I tried to introduce this package for
students to use in programming assignments.  I ended up giving up.
I reverted to an Ada 83-style package of my own construction in
which the (length, string) components were both visible.
Otherwise, the students ended up doing so many copying conversion
operations that I felt embarassed.  I was embarassed because the
less perceptive students might think I meant to teach that
sloppiness about recopying arrays is good style, and the better
students might conclude that Ada is an inherently inefficient
language.

****************************************************************

From: Thomas Wolf
Sent: Thursday, June 13, 2002  5:20 AM

On 12 Jun 2002 at 22:14, Randy Brukardt wrote:

> I've recently had similar problems using Unbounded_Strings.

Good to see that I'm not the alone with this!

> Second, there is no good reason to change the semantics of Get_Line at a
> file terminator. But this is based on a serious misconception about Ada (see
> below). And in any case, Get_Line should operate like the Get_Line we all
> expect.

About the misconception, see below.

> > Implementation advice: the Put and Put_Line operations should be
> > implemented such that no extra copy of the string data occurs.
>
> This one if even more dubious than the first one. It's likely that the
> implementation has to copy the string data for the existing Put and
> Put_Line, to put it into a buffer or to pass it to the operating system.
> You're saying that it can't do that? I think every implementation would have
> to violate this advice, making it useless. I realize your point is that you
> want the string directly written out, but again I think you have to leave
> this to the implementor.

No. The intention of this is just to discourage an implementation like

procedure Put_Line (S : in Unbounded_String) is
begin
  Ada.Text_IO.Put_Line (To_String (S));
end Put_Line;

(well, at least discourage it unless the compiler is smart enough to
avoid doing in effect

  declare
     Tmp : String := To_String (S);
  begin
     Ada.Text_IO.Put_Line (Tmp);
  end;

That's what I meant by "extra copy". Note that I wrote "extra copy",
not just "copy".)

> Why do you think this is not true? There is always an line terminator (and
> page terminator) before the file terminator, see A.10(7). If a file does not
> explicitly have a line terminator at the end of the file, the implementation
> has to implicitly provide one. That's been true since Ada 80.

Indeed. I misread A.10(7) and was tricked into thinking it wasn't so
because I have come across at least one implementation that didn't do
it this way and failed horribly on the last line of a file if that line
didn't have and end-of-line. But you're right, the whole issue is moot.

> > It should also be evaluated whether similar additions as proposed above
> > for bounded strings (Ada.Strings.Bounded) would make sense.
>
> Deleting bounded strings would make more sense. :-)

I do see the smiley, but much as I'd like to see it go, it wouldn't be
backwards compatible. So I'm afraid somebody will have to think about
whether there should be similar operations for bounded strings, too.

****************************************************************

From: Robert A. Duff
Sent: Thursday, June 13, 2002  9:27 AM

> No. The intention of this is just to discourage an implementation like
>
> procedure Put_Line (S : in Unbounded_String) is
> begin
>   Ada.Text_IO.Put_Line (To_String (S));
> end Put_Line;

It's a mistake to try to put this sort of thing in a standard.
If you want the compiler to do things efficiently, pester your
compiler vendor (preferably with checkbook in hand).  ;-)

The language definition should make it feasible, and perhaps even easy,
to do things efficiently.  But it should not try to *force* efficiency.

> (well, at least discourage it unless the compiler is smart enough to
> avoid doing in effect
>
>   declare
>      Tmp : String := To_String (S);
>   begin
>      Ada.Text_IO.Put_Line (Tmp);
>   end;
>
> That's what I meant by "extra copy". Note that I wrote "extra copy",
> not just "copy".)

OK, I have an implementation that does 37 copies, but it doesn't do the
"extra" 38'th one I was thinking of.  Is that good enough?  ;-)

My point is that defining "extra" in the context of a standard is not
feasible.  So don't waste a lot of energy trying.

Compiler writers do not deliberately try to make their products
inefficient.  Of course they cut corners to save money.  So what they
need is pressure from paying customers, so they can set their
optimization priorities right.

****************************************************************

From: Robert A. Duff
Sent: Thursday, June 13, 2002  9:39 AM

Randy said:

> Deleting bounded strings would make more sense. :-)

A year or so ago, I was writing a lexical analyzer.
I needed a buffer to keep the token text in (for identifiers
and the like), and there's a max length for tokens,
so I used Bounded_Strings, with a max length of 1000 or so.

I expected the lexer to be slower than the parser, since lexers look at
each character, whereas parsers look only at each token.  But the lexer
was 60 *times* slower, which surprised me.

After some investigation, I discovered that for each token, and for each
whitespace and comment character, it was entering the block that
declared the buffer.  One might expect that to be nearly free -- it has
to initialize the buffer length to 0.

But the implementation of Bounded_Strings initialized all 1000
characters (because some AI says "=" has to compose on these things)!

Changing it to use my own record type (length plus array of characters,
just like Bounded_String, but without the useless initialization),
increased the speed of the lexer by a factor of 100.

So much for reusable abstractions.

****************************************************************

From: Robert Dewar
Sent: Saturday, June 22, 2002  6:30 AM

> Implementation advice: the Put and Put_Line operations should be implemented
> such that no extra copy of the string data occurs.

The Ada RM is no place to put in requests for some particular optimization
that you want to see. How to spend time and effort in improving performance
of various language constructs is between vendors and the marketplace.

This partciular IA is ill advised in my opinion in any case, but for sure
IA of this type does not belong.

****************************************************************

From: Robert Dewar
Sent: Saturday, June 22, 2002  7:17 AM

> Compiler writers do not deliberately try to make their products
> inefficient.  Of course they cut corners to save money.  So what they
> need is pressure from paying customers, so they can set their
> optimization priorities right.

It is not a matter of cutting corners even. A simple implementation that
does an extra copy may be far superior to a complex one that does an
extra copy if the time for the extra copy is negligible in the entire
context of performance requirements.

****************************************************************

From: Robert A. Duff
Sent: Saturday, June 22, 2002  11:55 AM

> It is not a matter of cutting corners even.

Well, perhaps "cutting corners" is a somewhat rude choice of words.

>... A simple implementation that
> does an extra copy may be far superior to a complex one that does an
> extra copy if the time for the extra copy is negligible in the entire
> context of performance requirements.

I think you're missing a "not" in the above sentence.  Amusing typo.  ;-)

Anyway, you and I obviously agree that the kind of optimization advice
being discussed does not belong in the RM.

****************************************************************

From: Robert Dewar
Sent: Sunday, June 23, 2002  6:33 AM

Here is a package that we provide with GNAT that we have found useful for
solving some of these problems

------------------------------------------------------------------------------
--                                                                          --
--                         GNAT RUNTIME COMPONENTS                          --
--                                                                          --
--            A D A . S T R I N G S . U N B O U N D E D . A U X             --
--                                                                          --
--                                 S p e c                                  --
--                                                                          --
--                            $Revision: 1.12 $                              --
--                                                                          --
--          Copyright (C) 1992-1998, Free Software Foundation, Inc.         --
--                                                                          --
-- GNAT is free software;  you can  redistribute it  and/or modify it under --
-- terms of the  GNU General Public License as published  by the Free Soft- --
-- ware  Foundation;  either version 2,  or (at your option) any later ver- --
-- sion.  GNAT is distributed in the hope that it will be useful, but WITH- --
-- OUT ANY WARRANTY;  without even the  implied warranty of MERCHANTABILITY --
-- or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License --
-- for  more details.  You should have  received  a copy of the GNU General --
-- Public License  distributed with GNAT;  see file COPYING.  If not, write --
-- to  the Free Software Foundation,  59 Temple Place - Suite 330,  Boston, --
-- MA 02111-1307, USA.                                                      --
--                                                                          --
-- As a special exception,  if other files  instantiate  generics from this --
-- unit, or you link  this unit with other files  to produce an executable, --
-- this  unit  does not  by itself cause  the resulting  executable  to  be --
-- covered  by the  GNU  General  Public  License.  This exception does not --
-- however invalidate  any other reasons why  the executable file  might be --
-- covered by the  GNU Public License.                                      --
--                                                                          --
-- GNAT was originally developed  by the GNAT team at  New York University. --
-- It is now maintained by Ada Core Technologies Inc (http://www.gnat.com). --
--                                                                          --
------------------------------------------------------------------------------

--  This child package of Ada.Strings.Unbounded provides some specialized
--  access functions which are intended to allow more efficient use of the
--  facilities of Ada.Strings.Unbounded, particularly by other layered
--  utilities (such as GNAT.Patterns).

package Ada.Strings.Unbounded.Aux is
pragma Preelaborate (Aux);

   function Get_String (U  : Unbounded_String) return String_Access;
   pragma Inline (Get_String);
   --  This function returns the internal string pointer used in the
   --  representation of an unbounded string. There is no copy involved,
   --  so the value obtained references the same string as the original
   --  unbounded string. The characters of this string may not be modified
   --  via the returned pointer, and are valid only as long as the original
   --  unbounded string is not modified. Violating either of these two
   --  rules results in erroneous execution.
   --
   --  This function is much more efficient than the use of To_String
   --  since it avoids the need to copy the string. The lower bound of the
   --  referenced string returned by this call is always one.

   procedure Set_String (UP : in out Unbounded_String; S : String);
   pragma Inline (Set_String);
   --  This function sets the string contents of the referenced unbounded
   --  string to the given string value. It is significantly more efficient
   --  than the use of To_Unbounded_String with an assignment, since it
   --  avoids the necessity of messing with finalization chains. The lower
   --  bound of the string S is not required to be one.

   procedure Set_String (UP : in out Unbounded_String; S : String_Access);
   pragma Inline (Set_String);
   --  This version of Set_String takes a string access value, rather than a
   --  string. The lower bound of the string value is required to be one, and
   --  this requirement is not checked.

end Ada.Strings.Unbounded.Aux;


----------------------
-- REVISION HISTORY --
----------------------

--  ----------------------------
--  revision 1.1
--  date: 1997/01/25 15:24:52;  author: dewar;  state: Exp;
--  Initial revision
--  ----------------------------
--  revision 1.2
--  date: 1997/01/26 20:31:05;  author: dewar;
--  Add pragma Inline for Set_String
--  (Set_String): New version taking a String_Access value
--  ----------------------------
--  revision 1.3
--  date: 1998/04/27 12:14:21;  author: dewar;
--  Remove unused withs
--  Add missing copyright line to header
--  ----------------------------
--  New changes after this line.  Each line starts with: "--  "

****************************************************************

From the minutes of the Vienna meeting:

Randy explains that the readability of programs using unbounded strings is a
problem, because you have to convert to type String to do anything interesting.

Jean-Pierre comments that unbounded strings are really for storage; don't use
them for manipulation. That doesn't seem to be the intent expressed in the
standard.

Tucker would like to see a procedure version of To_Unbounded_String. He also
would like to add a defaulted starting parameter to all the Index functions.
Pascal immediately claims that that is not compatible.

Tucker hates making this look like an add-on. The new parameter would have to
be at the end. In that case, only renames (and overriding via derivation) would
be incompatible. These are unlikely.

There is not much interest in the slice version of the operations that were
proposed.

The group feels that I/O is generally valuable. Complex has this, and it is a
child of Text_IO. But you need access to the representation of unbounded
string. So it appears that it has to be a child of Unbounded. Steve Baird
objects, you could have an implementation package as a child of Unbounded.

Thus, we settle on the name Ada.Text_IO.Unbounded_IO to make it like
Complex_IO. It could be a rename from an implementation package. There also
would be a wide version (Ada.Wide_Text_IO.Unbounded_IO), of course.

The From parameter will need to be added to all index functions (for Fixed,
Bounded, and Unbounded).

****************************************************************

From: Adam Beneschan [adam@irvine.com]
Sent: Thursday, August 22, 2002 3:33 PM

[Editor's note: This comment was sent on a different subject, but since it
gives information about the real-world uses of Unbounded_Strings, I've also
attached it here...]

It may not be as weird as you think.  If Ada.Strings.Unbounded.-
Unbounded_String (defined as untagged private) is implemented as a
child of Ada.Finalization.Controlled (which is tagged), as suggested
by the Rationale, then any package that tries to derive from
Unbounded_String will run into this situation.  (We've seen real-life
Ada code that does define types derived from Unbounded_String.)

****************************************************************

From: Nick Roberts
Sent: Saturday, August 24, 2002 5:44 PM

I disagree with the argument against using the name
"Set_Unbounded_String" instead of "To_Unbounded_String" (for the
new procedures which analogise the "To_Unbounded_String"
functions).

The argument given in AI-301 (v1.3) is that using "Set_" would
risk name collisions with existing code. However, I do not see
why this name would carry any greater risk, in general, than
"To_". Thus I would urge the use of the less confusing "Set_".

I'm otherwise enthusiastic about the extra facilities.

****************************************************************

From: Randy Brukardt
Sent: Saturday, August 24, 2002  6:20 PM

But the proposed name was "Set", not "Set_Unbounded_String". Try the argument
again with that name...

****************************************************************

From: Robert Dewar
Sent: Saturday, August 24, 2002  9:11 PM

I consider this spec awful. It is highly non-upwards incompatible. I don'
t think we would consider implementing something that was upwards
incompatible in this way.

Take for example, Slice which now can return either a string or
unbounded string.

Well I have all over the place overloaded functions that take either
a string or unbounded string. The following is a model of the sort
of program that will get blown up:

with Ada.Strings.Unbounded; use Ada.Strings.Unbounded;
procedure Q is
   procedure Pstr (S : String) is begin null; end;
   procedure Pstr (S : Unbounded_String) is begin null; end;

   A : Unbounded_String;

begin
   Pstr (Slice (A, 1, 3));
end;

Sure, things can be mended. But mending things in a big system is not as
easy as technical folks would suppose.

I am strongly opposed to *ANY* non-upwards compatible changes, especially
when they are gratuitous as in this case.

Robert Dewar

The AI says

The new operations added to Ada.Strings.Fixed, Ada.Strings.Bounded, and
Ada.Strings.Unbounded could potentially cause new ambiguities in programs if
there is a use clause for the string package. However, this is unlikely,
and no programs change meaning (any incompatibilities cause compile-time
errors).

How did anyone decide this is unlikely?

The fact that no programs change meaning is thin comfort to people who have
to make changes to programs that they are not necessarily familiar with and
which are under strict configuration control (meaning that the process for
making ANY changes can be heavy).

To me, the ONLY acceptable way of adding functionality to existing packages
is to add child packages. Yes, it is a bit kludgy, but elegance takes a
back seat to compatibility requirements at this stage.

This makes me worry a lot. I hope this unacceptably casual view of
incompatibility is not showing up in other AI's. If it is, then I
think it makes it likely that the entire set of extensions will get
ignored.

****************************************************************

From: Randy Brukardt
Sent: Monday, August 26, 2002  6:21 PM

Yes, this example demonstrates the problem with Ada.Strings.Unbounded.

The most likely reason that PStr is overloaded in the first place is because
Slice didn't return an Unbounded string. So both versions were necessary to
avoid code explosion.

So your point is essentially that we can't fix the broken abstraction of
Ada.Strings.Unbounded because it would break the workarounds to the broken
abstraction of Ada.Strings.Unbounded. Perhaps you are right (this is only an
early draft of an AI in any case, and it will need to be discussed again).
That does seem to be a sad state of affairs if true.

In that case, my preference would be to dump Ada.Strings.Unbounded
altogether and start over (which would be compatible, I believe). Trying to
fix this with a child package would only make the embarassment of a broken
abstraction permanent. (If you think that Ada other than GNAT is on its
deathbed anyway, perhaps it doesn't matter.)

****************************************************************

From: Robert Dewar
Sent: Monday, August 26, 2002  9:06 PM

>
> In that case, my preference would be to dump Ada.Strings.Unbounded
> altogether and start over (which would be compatible, I believe). Trying to
> fix this with a child package would only make the embarassment of a broken
> abstraction permanent. (If you think that Ada other than GNAT is on its
> deathbed anyway, perhaps it doesn't matter.)

Actually Ada is alive and well, and we have plenty of competition :-)

I certainly would not "dump" Ada.Strings.Unbounded, that would be even more
incompatible. If you really think it is worth creating another slightly
different abstraction, go ahead. It seems like a waste of time to me. There
are more important fish to fry.

> > procedure Q is
> >    procedure Pstr (S : String) is begin null; end;
> >    procedure Pstr (S : Unbounded_String) is begin null; end;
> >
> >    A : Unbounded_String;
> >
> > begin
> >    Pstr (Slice (A, 1, 3));
> > end;
>
> Yes, this example demonstrates the problem with Ada.Strings.Unbounded.
>
> The most likely reason that PStr is overloaded in the first place is because
> Slice didn't return an Unbounded string. So both versions were necessary to
> avoid code explosion.

Nope, the reason that Pstr operates on String's and unbounded_strings is that
I have strings and unbounded strings in my application (and that will be trye
whatever you do to "improve" unbounded_string) and I want Pstr to be easily
applied to either.

Right now, the Unbounded_String abstraction is friendly to this approach.
In addition, imagine what a mess you get with trying to concatenate slices
if you make the change to Slice.

Unbounded_String is not nearly broken enough to be worth considering non
upwards-compatible fixes.

I don't think the abstraction is broken, on the contrary in some respects I
think you are trying to break it!

Please, let's not even consider seriously non-upwards compatible changes. They
will simply get ignored at this stage, and rightly so.

****************************************************************

From: Randy Brukardt
Sent: Monday, August 26, 2002  9:41 PM

> I don't think the abstraction is broken, on the contrary in some respects I
> think you are trying to break it!

Please explain this.

I think that it should be possible to pick a single string abstraction and
stick with it without having to switch to another. Ada.Strings.Unbounded falls
far short of this, many common operations force switching to regular strings
and Ada.Strings.Fixed. Very few operations allow any sort of combination or
operations purely on unbounded strings. That so true that Jean-Pierre Rosen
says that Ada.Strings.Unbounded is only good for storing strings. But if that
is true, why are all of those other operations there?

The only way to fix that is to add operations that can combine two unbounded
strings. Otherwise, the package is rather an embarrassment, as it is a lousy
example of an abstraction.

In my spam scanner, I tried to use Ada.Strings.Unbounded consistently with the
idea of showing Ada novices that it isn't any harder to use Ada than other
languages. Bad plan; I had to pull the strings out into Strings repeatedly to
search them, to do replacements, and other operations. Indeed, hardly anything
could be accomplished without converting to String - which is very verbose. If
I had realized that would be the case, I wouldn't have bothered with unbounded
string at all - the memory management would have been fairly simple.

Anyway, I have to agree that there are more important things to do. But it
seems unlikely to me that most of those would not meet your 'perfect
compatibility' requirement.

****************************************************************

From: Thomas Wolf
Sent: Monday, August 26, 2002  7:47 AM

I note that the proposal now includes new Index functions for
Ada.Strings.Fixed. Are these needed? If you have fixed strings,
can't you just work with slices directly? I thought the Index
operations were needed only for bounded and unbounded strings.

Furthermore, I think it should be specified that these new
Index operations propagate Ada.Strings.Index_Error if
From > Length (Source).

Also, you write that there'd be "similar operations" for bounded
and unbounded strings. What does that mean? What's the type of
parameter Pattern? String or (Un)Bounded_String? Or are there to
be two versions of each Index operation, one with Pattern of type
String, and one with Pattern of type (Un)Bounded_String?

If the latter, do you plan to also add versions of the existing
Index functions (without the From parameter) where the Pattern
would be of type (Un)Bounded_String?

If not the latter, why would there be only one, but not the other
variant?

In procedure Slice, I'd change the order of parameters to

procedure Slice
  (Source : in     Unbounded_String;
   Low    : in     Positive;
   High   : in     Natural;
   Target :    out Unbounded_String);

(specify the whole slice before passing the target, similar
to Replace_Slice.)

Why is parameter Target in To_Bounded_String and To_Unbounded_String
of mode "in out" instead of "out"? And why is parameter Item in
procedure Ada.Text_IO.Unbounded_IO.Get_Line of mode "in out" and
not just "out"? In procedure Slice, Target is of mode "out"...

Other than that, it looks good to me, although I do not understand
why the ARG did not retain the originally proposed operations where
the second parameter would be a slice of an (Un)Bounded_String,
specified by the string and a low and a high index. Ok, that would
add quite a few additional variants to the interface, but that's all,
and the interface would then be really complete. As it is, one still
would have to go through intermediary explicit representations of
the slices.

****************************************************************

From: Randy Brukardt
Sent: Monday, August 26, 2002  7:25 PM

> I note that the proposal now includes new Index functions for
> Ada.Strings.Fixed. Are these needed? If you have fixed strings,
> can't you just work with slices directly? I thought the Index
> operations were needed only for bounded and unbounded strings.

This is simply for consistency. Virtually every function in
Ada.Strings.Unbounded has a similar function in Ada.Strings.Fixed. It would be
odd if we didn't carry that through for the new Index functionality.

> Furthermore, I think it should be specified that these new
> Index operations propagate Ada.Strings.Index_Error if
> From > Length (Source).

Yes, of course.

> Also, you write that there'd be "similar operations" for bounded
> and unbounded strings. What does that mean?

Similar in the same way that these four are defined to the existing ones (just
adding a From parameter).

> Do you plan to also add versions of the existing
> Index functions (without the From parameter) where the Pattern
> would be of type (Un)Bounded_String?

No. "Patterns" are rather special; they typically aren't manipulated at the
same time as the items being searched. So I didn't include Index and Count
routines with Unbounded_String patterns.

That could be done (I wouldn't object strongly), but I worry if the
compatibility concerns mentioned by Robert would be severe in this case.

> In procedure Slice, I'd change the order of parameters to
>
> procedure Slice
>   (Source : in     Unbounded_String;
>    Low    : in     Positive;
>    High   : in     Natural;
>    Target :    out Unbounded_String);
>
> (specify the whole slice before passing the target, similar
> to Replace_Slice.)

Huh? The Target (well, it's called "Source", but it's where the result is
written) is the first parameter of Replace_Slice.

> Why is parameter Target in To_Bounded_String and To_Unbounded_String
> of mode "in out" instead of "out"? And why is parameter Item in
> procedure Ada.Text_IO.Unbounded_IO.Get_Line of mode "in out" and
> not just "out"? In procedure Slice, Target is of mode "out"...

Sloppy work on my part. My personal programming style never uses "out"
parameters on composite types, as they are always default-initialized and that
initialization must not be lost. (Practically, "out" and "in out" are the same
for composite types anyway.) But that's wrong for this package.

> Other than that, it looks good to me, although I do not understand
> why the ARG did not retain the originally proposed operations where
> the second parameter would be a slice of an (Un)Bounded_String,
> specified by the string and a low and a high index. Ok, that would
> add quite a few additional variants to the interface, but that's all,
> and the interface would then be really complete. As it is, one still
> would have to go through intermediary explicit representations of
> the slices.

Most of the ARG's concern was with the broken abstraction (the fact that you
have to explicitly convert to String before you can do much useful).
Performance issues were secondary (Unbounded strings are pretty expensive in
general; you won't use them much if performance is critical). The main reason
for the procedure versions of functions is simply that most of the functions
already have procedure versions; they all (except for the operator systems)
should have them to be consistent.

However (in my opinion), your proposed slice operations exist only to improve
efficiency. They're complex (at least in appearance), they muddy the
abstraction, and they add a lot of weight to the interface - mostly on little
used subprograms. No one spoke in favor of them; the general reaction was
"UGH".

We all have ideas that we think are great that get killed for one reason or
another. It's just part of the process - it's rare that a final feature looks
much like the initial proposal.

****************************************************************

From: Robert Dewar
Sent: Monday, August 26, 2002  10:30 PM

I use unbounded strings extensively in all my SPITBOL like programs using
g-spipat, and those programs work just fine.

They definitely will *NOT* work fine if you make the changes you are planning.

Once again, this is not nearly broken enough to even consider non-upwards
compatible changes.

We did not allow non-UC changes in Ada 95 except with VERY good justification,
which certainly does not apply here.

Data point: Not one of our customers ever asked questions about these aspects
of US, or complained, and believe me, they ask plenty of questions and make
plenty of complaints about the language in other respects.

The fact that someone decides something is broken is not enough reason to go
pestering it. Please point to a major user of Ada for whom this is an important
issue.

****************************************************************

From: Randy Brukardt
Sent: Tuesday, August 27, 2002  9:21 AM

This proposal did not come from me, I'm just trying to write it up. I don't
know the original proposer, so I can't say what perspective he has.

****************************************************************

From: Pascal Leroy
Sent: Tuesday, August 27, 2002  3:20 AM

I find Robert's argument compelling.  While I am not adamantly opposed to
non-upward compatible changes, I think the above example is something we
want to avoid, as it seems like a perfectly legitimate programming style.
It's one thing to add a new subprogram named Mess_With_Unbounded_String to a
package spec (where incompatibilities can only come from use clause
collisions, and are probably rare if the name is sufficiently convoluted).
It's an entirely different thing to add a new overload with parameter/result
types that are likely to be used in conjunction with unbounded strings.

Interestingly enough, I don't remember agreeing to changes regarding slices
during the last meeting.  That may have been overheating, but then I see in
the minutes that "there is not much interest in the slice version of the
operations that were proposed".  The ARG in the other hand was in favor of
beefing up the Index functions and of adding children I/O packages.

****************************************************************

From: Randy Brukardt
Sent: Tuesday, August 27, 2002  9:21 AM

> It's an entirely different thing to add a new overload with parameter/result
> types that are likely to be used in conjunction with unbounded strings.

OK, but in that case Robert is right - we can't add anything to any of the
string packages (which includes new Index routines). And I don't think any of
this is compelling enough to add child packages.

> Interestingly enough, I don't remember agreeing to changes regarding slices
> during the last meeting.  That may have been overheating, but then I see in
> the minutes that "there is not much interest in the slice version of the
> operations that were proposed".  The ARG in the other hand was in favor of
> beefing up the Index functions and of adding children I/O packages.

"Slice version" is the key words. The original proposal had slice versions of
the routines like Delete and Insert, as well as ones taking Unbounded_String. I
don't believe that there was much discussion on the orthogonolity routines
(taking Unbounded_String parameters), but I'm sure that we didn't decide to
drop them. The "Slice" function is one of the things that should return an
Unbounded_String. (Note that in my opinion, this routine never should have
returned String in the first place; the problem we have now is because of that
mistake.)

Anyway, without fixing the orthogonality, I don't see much benefit to this AI
at all (especially since the Index changes also cannot be allowed); I'd vote it
No Action in that case.

****************************************************************

From: Nick Roberts
Sent: Tuesday, August 27, 2002  10:30 AM

This may be a point that has been mentioned before, but I feel it
ought to be mooted (again).

I believe the primary rationale for introducing the
Ada.Strings.Bounded and Ada.Strings.Unbounded packages into the
standard was based on the assumption that compiler implementors
could implement these packages more efficiently (using 'insider
knowledge' and/or special machine code) than would be possible by
writing them in 'pure' (portable) Ada.

I suspect this rationale could be challenged (especially in the
context of RISC targets). I am myself dubious about it. Were
there other strong reasons for these packages being part of the
standard?

Would there be some sense in the idea of actually removing these
packages from the next revision of the standard? Obviously they
would have to remain available in practice, but their ongoing
specification (and maybe testing) could fall under the
jurisdiction of something separate from the (main) Ada language
standard itself.

Given the difficulties, in terms of time and manpower, the ARG
has in developing the standard (with all respect), would this
perhaps be a more pragmatic approach? I believe there are some
who feel this is the way for a 'containers' facility to be added
to Ada.

I personally don't have strong feelings one way or the other, but
it's perhaps something to consider.

We are no doubt all agreed that 'versionism' is evil. However, it
may be the lesser of two (or the least of several) evils to
specify the string packages in two versions -- the original (as
it is now) and the new (as improved by the proposal) -- and
permit implementations to support either, or both (selected by
some switch, option, or pragma perhaps), or even neither. In this
case, it may be easier if they are no longer specified in the
main Ada standard. Again, this is just an idea to consider.

If I'm retreading already trodden ground, my apologies.

****************************************************************

From: Bob Duff
Sent: Tuesday, August 27, 2002  4:46 PM

> I believe the primary rationale for introducing the
> Ada.Strings.Bounded and Ada.Strings.Unbounded packages into the
> standard was based on the assumption that compiler implementors
> could implement these packages more efficiently (using 'insider
> knowledge' and/or special machine code) than would be possible by
> writing them in 'pure' (portable) Ada.

I don't think that was the main reason.  I think these packages are
included because they provide generally-useful functionality that would
be useful to make portable.  In part, they are an answer to the
complaint that the predefined String can't do X, Y, and Z, whereas in
language Mumble, that functionality is standardly available.

> Would there be some sense in the idea of actually removing these
> packages from the next revision of the standard? Obviously they
> would have to remain available in practice, but their ongoing
> specification (and maybe testing) could fall under the
> jurisdiction of something separate from the (main) Ada language
> standard itself.

I don't like the idea of removing them.

Of course, anybody can create a better version of these packages, if
they like.

****************************************************************

From: Robert Dewar
Sent: Tuesday, August 27, 2002  4:59 PM

It is compeltely unacceptable to even consider removing useful functionality
from the standard. This package Strings. Unbounded is in wide use and it would
be unthinkable to remove it from the standard. It would give an impression of
a standards process that had run amok!

****************************************************************

From: Robert Dewar
Sent: Tuesday, August 27, 2002  4:28 PM

I find the idea of a child I/O package reasonable. GNAT has provided that
for some time. I assume the GNAT spec is in hand in this discussion?

****************************************************************

From: Robert Eachus
Sent: Thursday, August 29, 2002  2:26 PM

Nick Roberts wrote

>I believe the primary rationale for introducing the
>Ada.Strings.Bounded and Ada.Strings.Unbounded packages into the
>standard was based on the assumption that compiler implementors
>could implement these packages more efficiently (using 'insider
>knowledge' and/or special machine code) than would be possible by
>writing them in 'pure' (portable) Ada.
>
I widely distributed a package that became the basis for
Ada.Strings.Bounded, and it was about one page of specification a a body
that was not much longer.  But I think that what a lot of people who
used that package, or one very similar, missed was that the functions
and operations that returned (only) String did so for a reason.

The problem is best illustrated without reference to any specific
package.  If you write Put_Line(A & B & C), you don't want to be
ambushed by ambiguity.  The problem is that if you define all patterns
of function "&" for String and Bounded_String, you are lost.  My rule
was to overload the right parameter so that, in the above if B or C (or
both) is a Bounded_String it is parsed as (A & B) & C.  If A is a
Bounded_String, you can write Put_Line("" & A & B & C) or
Put_Line(To_String(A) & B & C).  An irritation, but much better than the
disaster you get if you add more overloadings.  The way I did it was
usable in the presence of multiple used instantiations of Bounded_String.

Now yes, with Ada 95 and beyond, any user who wants to can declare their
own bounded and unbounded string packages without much work. or extend
the existing packages.  But there is a huge difference between the
(trivial) amount of work required, and the deep understanding of Ada
rules required.  So the packages should stay in the standard.  Anyone
who doesn't like the choices made can "roll their own" packages.  I
often do.  But I know what can work and what can't, and most Ada
programmers don't have that degree of understanding.

****************************************************************

From: Robert Dewar
Sent: Thursday, August 29, 2002  8:10 PM

Thanks for this clarification Robert (Eachus).

Indeed this makes perfect sense, and I trust the ARG will refrain from
messing up this carefully thought out design :-)

****************************************************************

From: Randy Brukardt
Sent: Thursday, August 29, 2002  8:57 PM

Robert Eachus wrote:

> The problem is best illustrated without reference to any specific
> package.  If you write Put_Line(A & B & C), you don't want to be
> ambushed by ambiguity.  The problem is that if you define all patterns
> of function "&" for String and Bounded_String, you are lost.  My rule
> was to overload the right parameter so that, in the above if B or C (or
> both) is a Bounded_String it is parsed as (A & B) & C.  If A is a
> Bounded_String, you can write Put_Line("" & A & B & C) or
> Put_Line(To_String(A) & B & C).  An irritation, but much better than the
> disaster you get if you add more overloadings.  The way I did it was
> usable in the presence of multiple used instantiations of
> Bounded_String.

Thanks for enlightening us as to (part of) the reason for a bad design. :-)

What you didn't say is why you would want any functions that returned String
other than To_String. Reading between the lines, I would guess that you were
trying to add a storage management sort of type which is essentially part of
String, rather than a new abstraction. (That would explain why your spec.
was so short.)  In that case, you would intend to do almost all operations
with operands of type String (which is indeed what happens with
Ada.Strings.Unbounded).

Of course, somewhere in the 9x process, a lot of processing routines were
added to the design, which makes it look like Ada.Strings.Unbounded is a
complete abstraction -- which it is not.

If your design had been for a complete abstraction, you wouldn't expect
Put_Line (A & B) to work for Unbounded strings (in the absence of an
Unbounded string I/O package) -- indeed, I wouldn't. I'd expect to have to
write:
    Put_Line (To_String (A & B));

So, I wouldn't have any functions that returned String other than To_String.
I'd allow String operands in as many cases as possible (in order to allow
string literals), but that would strictly be secondary, and I'd only do it
where overloading problems aren't possible.

Of course, the presence of the String returning functions in
Ada.Strings.Unbounded make it impossible to 'fix' it to have a decent
abstraction. Indeed, since the intent was that it *not* be an abstraction,
any adding of stuff to it would make it appear even more as a real (but
broken) abstraction. Thus, I think that not only should we not add anything
to the package itself, but we shouldn't add any child packages either (as
the intent is that Ada.Text_IO is good enough as it is).

(Personally, I'm going to adopt Jean-Pierre's rule: use
Ada.Strings.Unbounded only if you need storage management, and never, ever
use it in code that is intended to show the elegance of Ada.)

****************************************************************

From: Robert Dewar
Sent: Thursday, August 29, 2002  9:17 PM

>>Thanks for enlightening us as to (part of) the reason for a bad design. :-)

Well for the record, I prefer the Eachus design to the (virtual) Brukardt
one :-)

****************************************************************

From: Craig Carey
Sent: Friday, August 30, 2002  2:17 PM

At 02\08\29 20:56 -0500 Thursday, Randy Brukardt wrote:
 >Robert Eachus wrote:
 >
 >> The problem is best illustrated without reference to any specific
...
 >If your design had been for a complete abstraction, you wouldn't expect
 >Put_Line (A & B) to work for Unbounded strings (in the absence of an
 >Unbounded string I/O package) -- indeed, I wouldn't. I'd expect to have to
 >write:
 >    Put_Line (To_String (A & B));
 >
 >So, I wouldn't have any functions that returned String other than To_String.

But that suggestion that the "&" functions return an Unbounded String, is
seemingly still under the shadow of the "if" in the text "If your design
had been for a complete abstraction, ...".

Instead the "&" operators can return a plain String.
It can be concise that way, if some "To_Unbounded_String()" function is
renamed as "-": e.g.:

   P, Q, R, S : Unbounded_String;
   ...
   Output_Unbounded_String (-(P & Q & R & S));

That is very concise and faster and a reduction of the completeness of
the abstraction. The plain Strings seem so efficient and simple that
they ought have some ability to make it better to reduce the
completeness of an abstraction.


...
 >broken) abstraction. Thus, I think that not only should we not add anything
 >to the package itself, but we shouldn't add any child packages either (as
 >the intent is that Ada.Text_IO is good enough as it is).
 >
 >(Personally, I'm going to adopt Jean-Pierre's rule: use
 >Ada.Strings.Unbounded only if you need storage management, and never, ever
 >use it in code that is intended to show the elegance of Ada.)
 >
 >



At 02\08\27 17:58 -0400 Tuesday, Robert Dewar wrote:
 >> Would there be some sense in the idea of actually removing these
 >> packages from the next revision of the standard? Obviously they
 >> would have to remain available in practice, but their ongoing
 >> specification (and maybe testing) could fall under the
 >> jurisdiction of something separate from the (main) Ada language
 >> standard itself.
 >
 >It is completely unacceptable to even consider removing useful functionality
 >from the standard. This package Strings. Unbounded is in wide use and it would
 >be unthinkable to remove it from the standard. It would give an impression of
 >a standards process that had run amok!
 >

When are the compilers going to implement faster Unbounded Strings?.
That seems to be what doing would imply: eventually at some time,
the compiler provide faster Unbounded Strings.


Persons misled on how standards procedure used to run be informed.

There is a problem with the ":=" operation being slow.

Here are some timing results. The rightmost column is microseconds per
assignments statement and the strings were 500 bytes long (and there
was 400 assignments per passage through a "declare" block declaring the
string variables being assigned:

GNAT 3.14p (-O2 option)

*** Access:  X := Y       :   0.0066
*** V_Str "Assign_Fast()" :   0.0316
*** Access:  X.all:=Y.all :   0.3130
*** V_Str "Assign()"      :   0.3816
*** V_Str ":="            :   1.7724
*** Unbounded String ":=" :   1.4299

(Unbounded Str x:=y)/(V_Str Ptr Swap) = 45.25  (=1.4299/0.0316)
(V_Str "x:=y")/(V_Str "Assign(x,y)")  =  4.645 (=1.7724/0.3816)

ObjectAda 7.2.1 (some no debug option):

*** Access:  X := Y       :   0.0000
*** V_Str "Assign_Fast()" :   0.0500
*** Access:  X.all:=Y.all :   0.3000
*** V_Str "Assign()"      :   0.3497
*** V_Str ":="            :   1.2003
*** Unbounded String ":=" :   0.9500

(Unbounded Str x:=y)/(V_Str Ptr Swap) = 19.00  (=0.9500/0.0500)
(V_Str "x:=y")/(V_Str "Assign(x,y)")  =  3.432 (=1.2003/0.3497)


Thus if I avoiding using Unbounded Strings, then a speed improvement
that could be as much as 20 to 45 times faster, becomes more possible.
It depends on what fraction of the assignments can be rewritten so
that swapping with the right hand side being lost. In one program,
data strings pass through tasks and procedures with most not
rewriting the data, and many assignments can be rewritten so that
the pointers in the fully open string records, are swapped.

The ratios 4.645 and 3.432, indicate that strings have to be roughly
1-7 kilobytes in size before the time lost in copying became similar
to the time spent in handling overheads associated with a use of ":="
(when the type is a controlled type). How would compiler writers
fix that problem of their being too much hidden code being added by
vendor's compilers. They may be big projects that made a mistake
with their choice of strings and are intending to rewrite their code.
(The Apache webserver project is one that got the choice of strings
wrong the result that the software is slow and it intends to correct
that by rewriting the string handling code).

The above numbers show that my StriUnli package's ":=" is slower
than the  Unbounded Strings' ":=".

But that result reverses with Unbounded Strings showing up as worse
when the fraction of time spent initializing and finalizing both
types of strings is increased to be half of the maximum possible.

Here are results showing that.

The numbers show microseconds per assignment operation:

GNAT 3.14p

*** Access:  X := Y       :   0.6501
*** V_Str "Assign_Fast()" :   0.9729
*** Access:  X.all:=Y.all :   0.7646
*** V_Str "Assign()"      :   1.5595
*** V_Str ":="            :   2.4684
*** Unbounded String ":=" :   3.4704

Aonix 7.2.1:

*** Access:  X := Y       :   0.5051
*** V_Str "Assign_Fast()" :   0.8499
*** Access:  X.all:=Y.all :   0.5502
*** V_Str "Assign()"      :   1.2997
*** V_Str ":="            :   1.8552
*** Unbounded String ":=" :   2.5549

The timing had 2 assignments ("x:=y; y:=x;") per finalize [and per
entry and exit into a 'declare begin end' block]. The string were
20 bytes long. Timed running in Windows 2000.

The results show that Unbounded Strings are slower. The GNAT code
seems to be simpler too. I never closely looked into that. But the
public might not be especially interested in Ada standards when it
can apparently get more advanced, simpler, up to 40 times faster
though not optimized, much less secretive strings packages, just
by downloading a file from some online archive. The public could
want to have Unbounded Strings dumped and not see a need to argue
a good case with the vendors that it should be removed instead of
being improved so that it runs faster.

Java's buffer/buckets strings allows hints to be given on how much
to allocate.

---

Currently to set the length of the V_Str S, to equal 10, in a low
level way, I would write this

    Vsr (S).all.Len := 10;

That does not seem too hard to person that is new to Ada, to learn.

[Note that S is directly changed and with Ada, there is no way to
have a 4th mode that says that a function parameter is seemingly
constant inside of the function but seemingly "in out" to the view
of the place where the function is invoked. Wouldn't time be better
spent on considering a new mode for parameters ?].



Here is code I used to get the timing results:

  http://www.ijs.co.nz/code/ada95_strings_pkg.zip

****************************************************************

From: Robert Dewar
Sent: Saturday, August 31, 2002  7:31 AM

>>When are the compilers going to implement faster Unbounded Strings?.

When paying customers find it to be a priority, so far we have not had
a single supported user who was concerned about the performance of
unbounded string.

****************************************************************

From: Randy Brukardt
Sent: Friday, August 30, 2002  6:57 PM

> Instead the "&" operators can return a plain String.
> It can be concise that way, if some "To_Unbounded_String()"
> function is renamed as "-": e.g.:
>
>    P, Q, R, S : Unbounded_String;
>    ...
>    Output_Unbounded_String (-(P & Q & R & S));
>
> That is very concise and faster and a reduction of the completeness of
> the abstraction. The plain Strings seem so efficient and simple that
> they ought have some ability to make it better to reduce the
> completeness of an abstraction.

Adding more "&" operators would be even more incompatible than what I proposed.
I'm certain that our "compatibility watchdogs" would raise quite a howl.

Admittedly, a large part of my problem is the verboseness of "To_String" and
"To_Unbounded_String" when you have to use them in virtually every unbounded
string expression. A shorter name would be welcome, but I doubt that could be
done compatibly enough to avoid screwing up existing code.

But, still I find it bizarre that Slice returns a String, and Tail returns an
Unbounded_String (Tail essentially being a specialization of Slice). Sigh.

****************************************************************

From: Robert Dewar
Sent: Friday, August 30, 2002  8:05 PM

<<But, still I find it bizarre that Slice returns a String, and Tail returns
an Unbounded_String (Tail essentially being a specialization of Slice).
Sigh.
>>

Just a little irregular, save the colorful word "bizarre" for more significant
things :-)

<<Admittedly, a large part of my problem is the verboseness of "To_String" and
"To_Unbounded_String" when you have to use them in virtually every unbounded
string expression. A shorter name would be welcome, but I doubt that could
be done compatibly enough to avoid screwing up existing code.
>>

provide renamings of "+"

Too bad JDI and RBKD could not convince people to add a conversion operator
:-)

****************************************************************

From: Bob Duff
Sent: Saturday, August 31, 2002  9:11 AM

What would the conversion operator have looked like?

****************************************************************

From: Florian Weimer
Sent: Tuesday, September 3, 2002 3:53 PM

AFAIK, the proposal involved a unary operator which was not used by
the core language, and which could be overridden by programmers.  Like
"+", but lacking the predefined meaning.

****************************************************************

From: Robert Dewar
Sent: Tuesday, September 3, 2002 3:35 PM

>>What would the conversion operator have looked like?

Our proposal was simply to allow the currency conversion symbol (also called
pillow, i forget its ISO name), its a square with curved sides (curved in)

as an undefined unary operator, available to the user to redefine for whatever
purpose, but with the idea of stylistically reserving it for conversions
(the way some people use "+" now)

****************************************************************

From: Jean-Pierre Rosen
Sent: Wednesday, September  4, 2002  2:05 AM

And this would be a mess today, since this symbol has been reassigned to the
Euro symbol... :-)

****************************************************************

From: Robert Dewar
Sent: Wednesday, September 4, 2002  2:18 PM

NO big mess. The euro symbol as a conversion operator is perfectly acceptable
I think. the point is to choose some special character that is NOT otherwise
used in the syntax.

****************************************************************

From: Pascal Obry
Sent: Wednesday, September 4, 2002  3:06 PM

So we have at least '@' or '~', both of them are certainly better looking
(at least for europeans) as conversion operator than the euro symbol.

****************************************************************

From: Robert Dewar
Sent: Wednesday, September 4, 2002  3:12 PM

'@' or '~' would be just fine, and arguably there are a lot of people for
whom upper half characters are a pain after all :-)

****************************************************************

From: Randy Brukardt
Sent: Wednesday, September 4, 2002  7:46 PM

I'd have to object to using '@'. It has been the conditional compilation
character in our compiler's preprocessor since the beginning of time (1981).
Virtually all of the code that we have (with the exception of Claw itself) uses
this feature extensively.

The compiler interprets '@' as an error (in canonical mode), as a space (in
condcomp on mode) or as a comment symbol ["--"] (in condcomp off mode). This
gives us the ability to build debugging and production versions of code without
changing anything.

The traditional conditional compilation solutions aren't useful for pragmas or
for context clauses, and these are the main uses for our preprocessor. For
instance, a typical unit in our compiler will start something like:

    with J2Type_Decs, J2Resolutions;
    @with J2Trace, J2Dump_Symbols, J2Dump_Types, Text_IO;
    package body J2Checks is
        pragma Debug(Off); pragma Suppress(All_Checks);
        @pragma Debug(On); pragma Unsuppress(All_Checks);

        ...

****************************************************************

From: Robert Dewar
Sent: Wednesday, September 4, 2002  8:32 PM

That seems like a weak argument. If we do decide to use @, then you can just use
@@ to mean a single @.

****************************************************************

From: Randy Brukardt
Sent: Wednesday, September 4, 2002  9:26 PM

> That seems like a weak argument.

Of course it's weak. I don't think that feature is used a lot by our customers,
so it is mainly us that is affected.

> If we do decide to use @, then you can just use
> @@ to mean a single @.

I suppose, although that complicates the lexical analysis somewhat. And
existing programs would not be compatible (although they could be converted
with a programs which would need to be aware of Ada lexical rules). Not a
trivial or painless solution.

I presume other compiler's preprocessors also use some of the "unused"
characters. Besides '@' and '~', '\' and '^' are unused (along with '[', ']',
'{', '}', and '`', which I think would be bad choices for this operation). I
wonder if any of these would cause trouble for preprocessors or other
frequently used tools?

****************************************************************

From: Craig Carey
Sent: Sunday, September  1, 2002  4:22 AM

<<But, still I find it bizarre that Slice returns a String, and Tail returns
an Unbounded_String (Tail essentially being a specialization of Slice).
Sigh.
>>

The package has other problems too.

There is a problem for programmers who remove a better strings package and
drop to use of Ada 95's Unbounded Strings.

A typical line in the source code might be this:

    X    : V_Str := ...;
    ...
    Text_IO.Put_Line ("Text = " & X & ".");  --  "&" returns a plain string

That has X be a variable length string in the package that is being
removed.

When the Ada 95 Unbounded Strings package is switched in (and the user
backs out of use of a better package) then there are two ways to rewrite
that source code line.

  (1)   Text_IO.Put_Line (+("Text = " & X & "."));
  (2)   Text_IO.Put_Line ("Text = " & (+X) & ".");

In that code, "+" is the "To_String()" function.

Option (1) would run slower so it would be avoided and option (2) would
be preferred.

A problem with the 2nd is that it is inconvenient to add the extra
parenthesis: "(+" and ")".

An advanced text editor that processes syntax errors can locate the 1st
perhaps, but inserting the 2nd ")" is not so easy. A text editor's
regular expressions search and replace feature may fail to be allow the
2nd substring (the ")") to be correctly inserted as required.

Also the same problem with excess rewriting of source code can occur
when Unbounded Strings are being removed, if the new superior package
does not define any "&" functions.

Another problem is that of replacing most instances of "X := Y" with
"Assign(X,Y)". That does seem to be desirable and I don't have ideas
on how to get vendors to provide both sides of the assignment, and how
to get it to run faster.

[It is excellent that controlled tagged types were not slow.]

--------

Also Ada's rules do allow this:

    Z := "(1<p)" & not "p>=2"  --  define symbolically a 1-D polygon (1<p<2)

; but do not allow these:

    Z := "(1<p)" & -"p>=2"    --  alter to Z := "(1<p)" & (-"p>=2");
    Z := "(1<p)" and -"p>=2"  --  alter to Z := "(1<p)" and (-"p>=2");
    Z := "(1<p)" not "p>=2"

I can't see why "X and -Y" is not allowed. (ref. RM 4.5).


...
><<Admittedly, a large part of my problem is the verboseness of "To_String" and
>"To_Unbounded_String" when you have to use them in virtually every unbounded
>string expression. A shorter name would be welcome, but I doubt that could
>be done compatibly enough to avoid screwing up existing code.
>>>
>
>Provide renamings of "+"
>

While that may be free of syntax errors and malfunctions, there is a doubtful
area in how to pair up "+", "-", with the to-string and from-string conversions.

Once Unbounded Strings are replaced with a superior package, then the conversion
away from a plain string to the strings of the package, could occur much less
in source code.

So if an aim was to have more "+"s in the users' code than "-"s, then it
could occur that persons thinking of Unbounded Strings prefer to have "+"
rename To_Unbounded_String(), but to maximise the "+"/"-" ratio, the superior
package differ and have "+" convert to a plain Ada 95 string. I presume it
should be right for the ideal strings instead of seeming right for the
existing Ada.Strings.Unbounded package.

Leaving Ada.Strings.Unbounded in Ada is not such a good idea since it could
be used in major projects and it does not seem to run fast enough.



>>>When are the compilers going to implement faster Unbounded Strings?.
>
>When paying customers find it to be a priority, so far we have not had
>a single supported user who was concerned about the performance of
>unbounded string.


Some of the GNAT Strings.Unbounded code is simple. E.g. here is the
Tail routine that shows that Tail() will run slower if its result is
converted to a plain string immediately after the function is called:

------------------------------------------------------------------------------
    function Tail
      (Source : Unbounded_String;
       Count  : Natural;
       Pad    : Character := Space)
       return   Unbounded_String is

    begin
       return
         To_Unbounded_String (Fixed.Tail (Source.Reference.all, Count, Pad));
    end Tail;
------------------------------------------------------------------------------

PS. The GNAT 3.14p file "a-strunb.adb" (which contains the body of the
package Ada.Strings.Unbounded) has 2.89 lines of conformant active code, per
function and procedure. [188 lines divided by 65 procedure and function
declarations].

   The lines counted excuded comments, blank lines, begin's, end's, and
   declarations (including those with an assignment).

Possibly GNAT's paying customers would not comment on a package that was
so lightweight. A complaint might not seem heavyweight.

The Tail function is implemented inefficiently since allocating bytes
without allocating excess spare space. The specs in the Reference Manual
don't require that. Its efficiency is limited by that of GNAT's
controlled types.

My timing tests showed that unnecessary copying is not the complete
problem the slowness of ":=" is comparable (and worse if the strings under
1000 bytes long). So ACT's paying customers might never get to complain
about Unbounded Strings since such complaints may transform into
complaints about GNAT's implementation of RM 7.6 (User-Defined Assignment
and Finalization).

Mr Dewar has used Unbounded Strings in the GNAT g-regpat.adb regular
expressions package, and ARG would deprecate declare Unbounded Strings
to be deprecated then possibly other vendors could instead prefer to
use 'access Strings' or whatever is available, instead.

****************************************************************

From: Robert Dewar
Sent: Sunday, September 1, 2002  5:23 AM

<<Leaving Ada.Strings.Unbounded in Ada is not such a good idea since it could
be used in major projects and it does not seem to run fast enough.
>>

If the ARG were to remove Ada.Strings.Unbounded from Ada they would lose
all credibility in the Ada community, and no vendors would pay any attention.
When Ada 95 was designed, upwards compatibileiy was a very important
requirement. That requirement is even more important in any future upgrade.

In fact many people are using Ada.Strings.Unbounded for manyu purposes
now quite successfully. Yes, more efficient packages and implementations
are possible, but the demand for these is non-existant as far as we are
concerned, and on the other hand, ASU is widely used in situations where
maximum efficiency is not a primary concern.

As to supposed shortcomings of the package design, that's a matter of taste.
So far the proposed modifications have all seemed to me to be plainly
undesirable.

At most it would be feasible to prpose an alternative package, but I doubt
that this would gather the necesary consensus for approval.

****************************************************************

From: Robert Duff
Sent: Sunday, September  1, 2002  8:29 AM

Robert says:

> If the ARG were to remove Ada.Strings.Unbounded from Ada they would lose
> all credibility in the Ada community, and no vendors would pay any attention.

Right.  The ARG would never seriously consider such a huge
incompatibility.

****************************************************************

From: Craig Carey
Sent: Sunday, September  1, 2002  5:48 AM

>Mr Dewar has used Unbounded Strings in the GNAT g-regpat.adb regular
>expressions package, and ARG would deprecate declare Unbounded Strings
>to be deprecated then possibly other vendors could instead prefer to
>use 'access Strings' or whatever is available, instead.

Correction: the  Regular Expressions packages do not use
Ada.Strings.Unbounded (this year). My comment was giving a wrong suggestion
since most GNAT specs code avoids using Unbounded Strings
(with exceptions being quite few, and including Ada.Strings.Unbounded.Text_IO
and Ada.Strings.Unbounded.Aux).

****************************************************************

From: Craig Carey
Sent: Sunday, September  1, 2002  9:18 PM

Corrections

At 02\09\01 22:48 +1200 Sunday, Craig Carey wrote:
 >At 2002\09\01 21:21 +1200 Sunday, Craig Carey wrote:
...
 >At 02\08\23 23:01 -0500 Friday, Randy Brukardt wrote:
...
 > >   http://www.ada-auth.org/cgi-bin/cvsweb.cgi/AIs/AI-00301.TXT
...
 >The ":=" is a poor performer even when not allocating and deallocating:

I presume that is wrong: I just timed with GNAT in Windows 2000 and a ":="
(over a controlled type that did not have user defined pointers in it),
where the Adjust() of the ":=" did nothing but was called, and it is
about the speed as an empty Assign(X,Y) procedure.

...
 > >values with the function To_Unbounded_String. However, when this function
 > >is used in an assignment statement, memory may be allocated twice (once by
 > >the function, and once by Adjust), which is substantial extra overhead. A
 > >procedure version of To_Unbounded_String would avoid this problem.

Also, in Windows NT and Windows 2000, deallocating memory can be of a slowness
similar to the slowness of allocating memory. That could be rechecked too.
I presume it is 4 similar slowdowns and not 2, that result from the
(able to be circumvented with difficulty) RM7.6 restrictions on what Adjust()
can know.

****************************************************************

From: Alexander Kopilovitch
Sent: Sunday, September  1, 2002  2:14 PM

Generalizing the Strings/Unbounded_Strings issue, I would propose a new notion
of "enveloped" private type. That is, a private type Y may be declared as an
envelope (new keyword) of some base type X:

  type Y is private envelope of X;

Enveloping type (Y above) is required to have 2 private primitive operations:

  function Strip (Source : Y) return X;

and

  function Upgrade (Source : X) return Y;

which must be exact inverse of each other:

  Strip( Upgrade(V) ) = V  and Upgrade( Strip(W) ) = W

and their implementation is severely restricted so that compiler can verify
and guarantee these identities.

Then, a variable of enveloped type may be immediately initialized with a value
of enveloping type and vice versa, in all cases of initialization, which include:

1) declaration with initialization

   V : X := R;  -- where R is either a variable or constant of type Y
                -- or a function returning result of type Y

   W : Y := S;  -- where S is either a variable or constant of type X
                -- or a function returning result of type X

2) argument for "in" parameter of a subroutine call

   function F(A : in X; B : in Y)
   ...
   procedure P(A: in X; B : in Y)
   ...
   V : X;
   W : Y;
   ...
   ... := F(W, V);
   P(W, V);

3) argument for "out" parameter of a procedure call

   procedure P1(U : out X)
   ...
   procedure P2(U : in out X)
   ...
   W : Y;
   ...
   P1(W);
   P2(W);

In all these cases a compiler provides implicit conversions between types X
and Y using private operations Strip and Upgrade of Y.

Further, there may be several different envelopes for the same base type:

  type Y is private envelope of X;
  type Z is private envelope of X;

and one of those envelopes may be immediately used for an initialization of a
variable or parameter of another envelope type (as in the previous case above).
For example:

  procedure P(W : out Y)
  ...
  T : Z;
  ...
  P(T);

The compiler provides implicit conversions between types Y and Z using
compositions Z.Upgrade(Y.Strip(...)) and Y.Upgrade(Z.Strip(...)) .

I believe that the notion of enveloped type may be considered (to some extent)
as opposite to the notion of subtype.

****************************************************************

From: Robert Dewar
Sent: Sunday, September  1, 2002  2:19 PM

I think a proposal like Alexander's envelope proposal, especially one, which
like this one, adds a very large amount of complexity, should always be
accompanied by a motivating example worked out in detail, showing how
some problem is solved with the new feature, and what is required to soilve
the same problem without the new feature.

For me, it would take a lot of convincing to accept a big feature like this,
and I see it as only a minor convenience feature. But perhaps an example
would show why I am wrong :-)

****************************************************************

From: Robert Duff
Sent: Sunday, September  1, 2002  4:06 PM

>   type Y is private envelope of X;

First, I agree with Robert that motivating examples are needed
for this sort of thing.

Second, why invent a new kind of type.  Wouldn't it be simpler to invent
a feature called "user-defined implicit conversions"?  I think that idea
has been discussed here before.

> Enveloping type (Y above) is required to have 2 private primitive operations:
>
>   function Strip (Source : Y) return X;
>
> and
>
>   function Upgrade (Source : X) return Y;
>
> which must be exact inverse of each other:
>
>   Strip( Upgrade(V) ) = V  and Upgrade( Strip(W) ) = W
>
> and their implementation is severely restricted so that compiler can verify
> and guarantee these identities.

Those severe restrictions seem difficult to define.

> Then, a variable of enveloped type may be immediately initialized with
> a value of enveloping type and vice versa, ...

Why initialization, and not assignment statements?

> 3) argument for "out" parameter of a procedure call

I presume this can only work if the thing is passed by copy?
How does this interact with "by reference" (and "return by reference")
types?

> The compiler provides implicit conversions between types Y and Z using
> compositions Z.Upgrade(Y.Strip(...)) and Y.Upgrade(Z.Strip(...)) .

Are there interactions with explicit type conversions?
Need to think about whether ambiguities can be introduced.

> I believe that the notion of enveloped type may be considered (to some
> extent) as opposite to the notion of subtype.

I don't understand the analogy -- please explain.

I'm sure generics need some corresponding changes.
Every time you change the semantics of private types,
you need to think about corresponding changes to generics.
See AARM-7.3(19.a-19.f).

Would you allow:

    type Y is new T with private envelope of X;

or some other permutation of those keywords?
That is, surely one would sometimes want to make the envelope
visibly derived from some other type.

****************************************************************

From: Alexander Kopilovitch
Sent: Wednesday, September  4, 2002  3:49 PM

>> I believe that the notion of enveloped type may be considered (to some
>> extent) as opposite to the notion of subtype.
>
>I don't understand the analogy -- please explain.

Subtyping imposes restrictions on the type, while enveloping lifts
restrictions, imposed on the type (at the cost of speed or memory, but not of
safety).

  Their essential common feature is virtually seamless interoperability with
the base type and other subtypes or envelopes of the latter.

  But yes, the analogy is not direct, because of different nature of
restrictions involved: subtyping deals with restrictions on the type's domain,
which is inherent to the type itself, while enveloping deals with the
restrictions imposed by the type's surrounding environment and provides, say,
additional lifestyles for objects of the type.

****************************************************************

From: Pascal Leroy
Sent: Monday, September  2, 2002  7:02 AM

First, let me remind everybody that this is a mailing list for discussing
possible improvements/extensions to the Ada language.  Stylistic considerations
about the optimal ratio of "+" vs "-" or about the wizardry needed to rewrite
code using a text editor having nothing to do here.  There are forums more
appropriate for this kind of chit-chat (CLA for example).

Second, let me make one thing very clear: the ARG is not, repeat not, going to
drop or obsolesce Ada.Strings.Unbounded_String.  The ARG could introduce minor
incompatibilities in this unit if they were to bring significant benefits,
although this would take some convincing.  It could introduce another string
package providing similar capabilities, but that would take even more
convincing.  But there is no point in discussing the removal of ASU, as this is
not going to happen in our lifetime.

This being said, I would like to make a few comments on the performance issue.

1 - Contrary to popular belief, the main reason why the predefined units are
defined in the RM is not because vendors can provide a super-efficient
implementation for them.  Predefined units are intended to provide services
that are well-defined and generally useful.  Most of the time they are
implemented in pure Ada with little or no compiler magic.  Spending a lot of
engineering effort in developing special-purpose magic for ASU is not the right
trade-off for vendors: they are better off working on improving the general
quality of their compiler and of the generated code, as it benefits all users,
not only the vanishingly small minority who critically depends on the
performance of ASU.

2 - The comparison between pointer-to-string assignments and Unbounded_Strings
assignment is entirely bogus (for the record, the ratio for Rational Apex
4.0.0b turns out to be 25, ie in the same ballpark as for GNAT and ObjectAda).
It should come as no surprise that assigning pointers is more efficient than
assigning controlled objects with the attendant storage management.  But as
soon as the pointers used for implementing the strings are exposed, there is
the risk of storage leaks, multiple deallocation or other plagues.  By
completely encapsulating the storage management issues, ASU prevents this sort
of bugs, and that's very important for critical and/or long running
applications.

3 - Although we had some customers complaining about the performance of ASU in
the '95-'96 timeframe (and we did improve it) I haven't seen a report on that
topic for years.  Actually the last set of changes that we made to ASU was
prompted by a customer who had Unbounded_Strings shared among tasks and wanted
a tasking-safe version.  My first reaction was "if we do this you are not going
to like the result because it's going to be much slower".  Their response was
interesting.  They said: "look, we are not doing any ASU operation, or any
string operation, or any operation involving heap, in the time-critical parts
of our application; but in the non-critical parts we use ASU everywhere to
guarantee that we don't have storage bugs; and we also depend for correctness
on ASU operations to be atomic wrt tasking".  We ended up making this customer
happy by providing two variants of this package.  My point is that here is a
real life application that includes both hard real-time and command and control
components, and these folks didn't give a damn about the speed of ASU.

****************************************************************

From: Robert Dewar
Sent: Monday, September  2, 2002  9:38 AM

Pascal, can you be a little more detailed about the task safety issue in
Unbounded_String. That does seem a legitimate discussion. What does the
RM require? What do implementations provide? What extra features are
desirable?

****************************************************************

From: Pascal Leroy
Sent: Monday, September  2, 2002 10:08 AM

It is actually quite unclear what the RM requires, and when the issue cropped
up we had heated internal discussions on that topic.  The paragraph in question
is A(3) which says that "the implementation shall ensure that each language
defined subprogram is reentrant in the sense that concurrent calls on the same
subprogram perform as specified, so long as all parameters that could be passed
by reference denote nonoverlapping objects."

The problematic phrase of course is "nonoverlapping objects".  If you interpret
nonoverlapping with the low-level meaning "don't share bytes" then it's hard to
imagine how the user of a private type could decide if two objects overlaps
(because the private type may or may not be implemented with levels of
indirection).  A definition that would not violate the contract model of
private types would have to be able to answer the question "could these two
objects possibly overlap?" in term of high-level Ada semantics, and that
doesn't seem straightforward.  Certainly the RM does a lot of hand waving here.

In practice our "normal" implementation uses reference counting and so is not
safe in the face of tasking (any assignment actually creates overlapping
objects) so if you execute an assignment X := Y and pass X and Y to two tasks
which modify these variables, pretty quickly you end up with a corrupted
reference count.  This is an annoyance, and we document it, but it provides the
best performance in sequential programs.  Note that implementation (like GNAT,
I believe) which do deep copy of the string can still run into trouble if for
instance two tasks execute in parallel the assignments X := Y and Y := Z.  You
could end up assigning to X the first half of Y and the second half of Z,
depending on interleaving.

In order to deal with the customer request to make ASU tasking-safe we added a
child unit of ASU (called, imaginatively, Ada.Strings.Unbounded.Rational) with
a single procedure, Make_Tasking_Safe.  This procedure turns a global boolean
which is tested when entering each subprogram in ASU to choose the appropriate
implementation.  In tasking-safe mode, we do deep copy through a (single)
protected object (it is not possible to create a tasking-safe implementation of
reference counting in Ada--at least I couldn't find a way--sigh).

****************************************************************

From: Florian Weimer
Sent: Monday, September  2, 2002  4:34 PM

> Note that implementation (like GNAT, I believe) which do deep copy
> of the string can still run into trouble if for instance two tasks
> execute in parallel the assignments X := Y and Y := Z.

I think this case is already defined to be erroneous in 9.11(11).

****************************************************************

From: Robert Dewar
Sent: Monday, September  2, 2002  4:50 PM

Yes, of course that is obviously erroneous, this would be obviously true
with any non-atomic type, since you have one task reading a variable at
the same time that some other task is writing it.

In other words, common sense tells us this is erroneous, we don't need
to consult the refrerence manual :-)

****************************************************************

From: Pascal Leroy
Sent: Tuesday, September  3, 2002  1:55 AM

The thing that bothers me is that, because Unbounded_Strings is a private
type, and because the RM doesn't specify much regarding the concurrency
behavior of ASU, the above situation may or may not be erroneous, and the
user has no way to tell.  (Unless of course she has access to the source,
which is the case with GNAT, but that's beside the point, we are talking
language definition here.)

> In other words, common sense tells us this is erroneous, we don't need
> to consult the reference manual :-)

If Unbounded_Strings are implemented a la GNAT, the assignments above are
erroneous.  If they are somehow synchronized with a protected object, there
is no erroneousness.  It would seem to be a useful thing to specify in the
RM (heck, it could merely be a documentation requirement, no need to force
implementations to change).

****************************************************************

From: Jean-Pierre Rosen
Sent: Tuesday, September  3, 2002  2:54 AM

> If Unbounded_Strings are implemented a la GNAT, the assignments above are
> erroneous.  If they are somehow synchronized with a protected object, there
> is no erroneousness.  It would seem to be a useful thing to specify in the
> RM (heck, it could merely be a documentation requirement, no need to force
> implementations to change).

I don't think this is needed. Unsynchronized access had always been
erronneous, nothing new here. Now, if an implementation behaves correctly in
erroneous conditions, it is just one of the allowed behaviours under
erroneous execution :-)

****************************************************************

From: Bob Duff
Sent: Tuesday, September  3, 2002  8:56 AM

Robert says:

> > In other words, common sense tells us this is erroneous, we don't need
> > to consult the reference manual :-)

My common sense happens to agree with Robert's in this case.
If you do X := Y and Y := Z in parallel on normal Strings,
it's erroneous.  Unbounded strings are supposed to be growable,
but in other ways, they ought to be just like Strings.

Pascal says:

> If Unbounded_Strings are implemented a la GNAT, the assignments above are
> erroneous.  If they are somehow synchronized with a protected object, there
> is no erroneousness.  It would seem to be a useful thing to specify in the
> RM...

Yes, the RM should have said so more clearly (assuming the A(3)
paragraph is insufficient).  I suppose there are other private types
where this issue arises?

>... (heck, it could merely be a documentation requirement, no need to force
> implementations to change).

I don't see any need for a documentation req't.  Calling it erroneous
doesn't force any implementation to change.

****************************************************************

From: Robert Dewar
Sent: Tuesday, September  3, 2002  3:09 PM

<<The thing that bothers me is that, because Unbounded_Strings is a private
type, and because the RM doesn't specify much regarding the concurrency
behavior of ASU, the above situation may or may not be erroneous, and the
user has no way to tell.  (Unless of course she has access to the source,
which is the case with GNAT, but that's beside the point, we are talking
language definition here.)>>

I disagree.

If you have one task writing a bounded string and one task reading the
same bounded string, that's obviously an improperly shared variable in
the sense of RM 9.10, and clearly should be considered erroneous.

Now if there are no visible shared variables of this case, then of course
simultaneous task access should work fine. If you allow weasle arguments
about the implementation sharing implicit stuff then the statement in the
RM about task safety is compeltely meaningless.

I don't think it is meaningless, I think it is very useful. In my opinion
any implementation of unbounded strings that does not allow different tasks
to do different things to different unbounded string objects is incorrect.

****************************************************************

From: Ted Baker
Sent: Wednesday, September  4, 2002  2:25 PM

> ... "the implementation shall ensure that each language defined
> subprogram is reentrant in the sense that concurrent calls on the same
> subprogram perform as specified, so long as all parameters that
> could be passed by reference denote nonoverlapping objects."
...
> In practice our "normal" implementation uses reference counting and so is not
> safe in the face of tasking (any assignment actually creates overlapping
> objects) so if you execute an assignment X := Y and pass X and Y to two tasks
> which modify these variables, pretty quickly you end up with a corrupted
> reference count ... Make_Tasking_Safe.  This procedure turns a global boolean
> which is tested when entering each subprogram in ASU to choose the appropriate
> implementation.  In tasking-safe mode, we do deep copy through a (single)
> protected object (it is not possible to create a tasking-safe implementation of
> reference counting in Ada--at least I couldn't find a way--sigh).

Exactly.

A fully tasking-correct implementation of
unbounded strings could not use reference semantics, and so would be
too inefficient to be of interest to anyone.  A naive programmer could
easily use these strings in ways that would indeed lead to some nasty
hard-to isolate bugs in concurrent code.

In that sense, I agree with those who would like to see this package
"deprecated".

****************************************************************

From: Robert Dewar
Sent: Wednesday, September  4, 2002  2:39 PM

This is nonsense.

We have lots of customers using this package, for whom the performance of
the GNAT implementation is just fine. In fact we don't have any user who
has expressed any concerns about the performance of this package.

THe idea of deprecating a package just because there *might* be someone
who found it inefficient is about as silly as urging that a television
channel should be removed because someone might find it boring.

If there is a need for a package with a different spec then go ahead and
propose one, but please don't waste time trying to eliminate or deprecate
the spec that is there now and which is widely used and definitely useful.

As for naive programmers, note that as far as I am concerned a correct
implementation cannot naively use reference counts because it wouold be
wrong.

To say that reference semantics cannot be used at all is wrong. There is
no problem in implementing reference counts that are task safe, it just
requires locks. Whether these locks represent a good time/space tradeoff
depends on how light locks are (back in the good old 8080 days, my tasking
operating system took two instructions to take or release a lock :-)

****************************************************************

From: Ted Baker
Sent: Wednesday, September  4, 2002  2:47 PM

The problem is how should a programmer (not knowing the implementation)
predict when the same objects are being accessed concurrently?

It may be "obvious" for entire strings, but it is not so obvious
to a user whether a value (say passed out of a call to a function)
is actually a substring of some other string (because the function
obtained the string by taking a substring -- e.g., Tail of some
other string).

****************************************************************

From: Robert Dewar
Sent: Wednesday, September  4, 2002  2:55 PM

If this kind of reference semantics is used, it must be transparent to the
programmer. These two objects are independent in tasking terms, and it would
be improper to reflect underlying sharing at the semantically noticeable
level.

You are thinking too much in implementation terms

Separate objects of type Unbounded_String are separate objects in the semantic
sense of tasking independence, and the implementation, whatever techniques
it uses, must preserve this semantic view.

****************************************************************

From: Tucker Taft
Sent: Wednesday, September  4, 2002  3:31 PM

I agree.  The one place we "cheated" with respect to concurrency
is for the Text_IO routines that have "implicit" File_Type
parameters.  I think we have authorized implementations to
*not* use a lock just so that concurrent Put_Line's to
the default Current_Input will work.

I would think that if there are other "implicit" globals
like these that are part of the *semantic* description
of a subprogram, then concurrency need not be supported.

But I agree with Robert that globals or other sharing that
are introduced by the implementation, and are not part of
the language-defined semantics of the subprogram, will need some
kind of locking or careful updating by the implementation.
The user should *not* need to be aware of these kind
of implementation details when writing concurrent
programs.

****************************************************************

From: Craig Carey
Sent: Thursday, September  5, 2002  1:59 AM


This is not clarifying

Robert Dewar wrote:

 >Separate objects of type Unbounded_String are separate objects in the semantic
 >sense of tasking independence, and the implementation, whatever techniques
 >it uses, must preserve this semantic view.

And also it seems to me that Ada's own strings do not follow that requirement.


It is not all that obvious on where to draw the line between task unsafe
being acceptable or otherwise.

For example, what about when Unbounded String procedure only handles a
single Unbounded String argument ?.  For example:

U1, U2  : Unbounded_String;    --  Globals
Last    : Natural := 300_000_000;
A       : String (1 .. Last);


{Task 1}:      U1 := To_String (A & To_Unbounded_String (A));
                A := A (3000 .. Last) & A (1 .. 2999);

{Task 2}:      A (K) := '+';


Plain Ada Strings are not task safe.
Compilers can refuse to make an Ada String be 'atomic'.

Should there be task locking inside of the To_String() routine and the
To_Unbounded_String() routines.

Instead of the words "reference semantics" should be "transparent" (i.e.
hidden). it might be safer to say that data structures
(perhaps overlooking all of the Character data) should not be corrupted
in a way that leads to problems including these:
    (1) an exception can be raised later;
    (2) a string is lost track of or a plain string is wrongly pointed to
      more than once;
    (3) an internal length field (in some record) becomes longer than the
     real length of the allocated string.

This could be tasking unsafe in an analogy with how the array
concatenating "&" may be:

    function Ada.Strings.Unbounded."&" (Left, Right : in Unbounded_String)
          return Unbounded_String;

I am unclear on whether a 'To_String()' routine ought do any task
locking.

****************************************************************

From: Robert Dewar
Sent: Thursday, September  5, 2002  7:25 PM

Craig says

> Plain Ada Strings are not task safe.

This is completely confused. Craig, you entirely miss the point. Of course two
tasks accessing the same string is erroneous. If the tasks try to access the
same element of the string (one storing and one reading), then the normal
shared variable rules of 9.10 make the program erroneous, and if they access
separate elements, the independence requirement is not satisfied because the
strings are packed.

This is really not the place for elementary discussion of basic Ada concepts!

> Compilers can refuse to make an Ada String be 'atomic'.

Yes, of course, this rejection is expected if you understand atomic.

> I am unclear on whether a 'To_String()' routine ought do any task
> locking.

Probably you should take this kind of discussion to comp lang ada.

Once again (I am assuming no one else is confused but who knows?) The issue in
a package like unbounded strings is that if two tasks do operations on
different objects of type unbounded string, then there must be no visible
inteference between the two operations (internal syncrhonization of some kind
may or may not be required at the implementation level, but that is of no
concern here).

If two tasks do operations on the same unbounded string object (one reading
and one writing), then the program is erroneous in the normal 9.10 sense.
The only way this would not be the case is if the implementation provided
a pragma Atomic for the type unbounded string, but of course this is only
a theoretical observation, in practice no architecture would provide for
atomic access to strings of unlimited length :-)

****************************************************************

From: Pascal Leroy
Sent: Thursday, September  5, 2002  7:55 AM

> To say that reference semantics cannot be used at all is wrong. There is
> no problem in implementing reference counts that are task safe, it just
> requires locks.

Well, it's not that easy.

The problem is that controlled types won't let you lock a piece of data (say,
the reference count) for the entire duration of an assignment operation. You
can of course implement Adjust and Finalize using protected operations, and
that will take care of some of the concurrency problems. However, when you
consider the sequence of events that take place during an assignment operation,
there is a critical period after the rhs has been copied to the lhs but before
Adjust has been called, where the reference count does not actually reflect the
number of references that exist. This inconsistency can cause trouble if
another task comes in at that very moment.

It may be that such situations only arise in the presence of erroneous sharing
in the sense of 9.10, but I couldn't convince myself that this was true.  And
at any rate it doesn't help placate customers who demand a fully tasking-safe
implementation of ASU.

Note that in languages where you redefine the entire assignment operation (e.g.
C++) then evidently you lock the object for the whole duration in the call to
operator= and there is no difficulty.

****************************************************************

From: Robert Dewar
Sent: Thursday, September  5, 2002  8:35 AM

> It may be that such situations only arise in the presence of erroneous sharing
> in the sense of 9.10, but I couldn't convince myself that this was true.  And
> at any rate it doesn't help placate customers who demand a fully tasking-safe
> implementation of ASU.

As far as I can see this is exactly right, it is only a problem if you have
erroneous shared variables in the sense of 9.10.

As for a fully tasking-safe implementation of ASU, I find this an odd concept.
It is unreasonable to demand that the type Unbounded_String be Atomic, and if
it is not, then two tasks doing improper simultaneous access to the same
unbounded string object is erroneous by 9.10.

I see the situation as completely clear here from the user level semantic
point of view. Yes, there may be implementation difficulties in providing
the correct semantics, especially if you attempt inappropriate optimziations
(inappropriate in that they violate the clear requirement that separate
unbounded string objects be independent in the tasking access sense).

There are many opportunities for incorrect optimizations of various Ada
constructs, but this points to problems of implementation, not design!

****************************************************************

From: Nick Roberts
Sent: Friday, September  6, 2002  9:10 AM

I'm doing a bit of summarising of this thread so far. Randy
summarised the discussion at Vienna with regard to the
Ada.Strings packages by writing a proposal for an updated AI 301,
which can be seen here:

http://www.ada-auth.org/cgi-bin/cvsweb.cgi/AIs/AI-00301.TXT

It would seem, after discussions here, the following changes
suggested by this AI would be generally acceptable:

[K1] The addition of procedure To_Bounded_String to package
Ada.Strings.Bounded and procedure To_Unbounded_String to package
Ada.Strings.Unbounded, as proposed.

[K2] The addition of packages Ada.Text_IO.Bounded_IO,
Ada.Wide_Text_IO.Wide_Bounded_IO, Ada.Text_IO.Unbounded_IO, and
Ada.Wide_Text_IO.Wide_Unbounded_IO, as proposed. (Note, I have
assumed these facilities would be as useful for bounded strings
as for unbounded strings.)

[K3] The addition of functions Index and Index_Non_Blank, as
proposed, to Ada.Strings.Bounded and Ada.Strings.Unbounded (only)
(since they are upwards compatible, and may be useful).

[K4] The additional functions and procedures Replace_Slice,
Insert, Overwrite, could be retained, since (in the absence of
the proposed Slice) they are unlikely to cause problems in
practice.

It would seem that the following parts of the proposal need to be
removed:

[D1] The additional functions Slice (since they could introduce
ambiguities in existing code that would cause it to fail to
compile).

I would like to propose the following changes:

[NJR1] The proposed Slice functions could simply be renamed to
Bounded_Slice and Unbounded_Slice, so as to obviate the ambiguity
problem.

[NJR2] The procedures To_Bounded_String and To_Unbounded_String
are renamed Set_Bounded_String and Set_Unbounded_String.

****************************************************************

From: Robert Eachus
Sent: Monday, September  9, 2002  3:45 PM

First I would like to note that Nick Robert's sent a summary of the
position so far, but with the wrong topic.  I hope Randy can get it to
the right spot..

Next, on the topic of "efficient" implementations of
Ada.Strings.Unbounded, I am completely at a loss.  Ada.Strings.Bounded
should be much faster at the cost of "extra" storage space.  If you are
reading lines from a file and want efficiency, you either have a
specified (in some cases by the operating system) maximum length on
lines in a file, or you put in the "extra" overhead to deal with lines
that are too long for the preset buffer size.

Having Ada.Strings.Unbounded available makes this special case code a
lot easier to write.   Ada.Strings.Unbounded is also fine for cases
where you are only reading a few strings and don't want to have to worry
about string lengths.  A perfect example is for holding the command
line. (Yes you can put this in a String constant, but that may take an
extra level of nesting.)  Another example is when you have user input
from the console, or a text buffer in an HTML script.  In any of these
cases the overhead of using Ada.Strings.Unbounded will normally be in
the noise.

Now to respond to Nick's summary:

[K1] The addition of procedure To_Bounded_String to package
Ada.Strings.Bounded and procedure To_Unbounded_String to package
Ada.Strings.Unbounded, as proposed.

I don't know why this is needed, but at least it is harmless.

[K2] The addition of packages Ada.Text_IO.Bounded_IO,
Ada.Wide_Text_IO.Wide_Bounded_IO, Ada.Text_IO.Unbounded_IO, and
Ada.Wide_Text_IO.Wide_Unbounded_IO, as proposed. (Note, I have
assumed these facilities would be as useful for bounded strings
as for unbounded strings.)

Same as above.  But I think that the Bounded versions would be even less
useful, since they would require another generic instantiation.  (There would
have to be a generic formal package parameter.)

[K3] The addition of functions Index and Index_Non_Blank, as
proposed, to Ada.Strings.Bounded and Ada.Strings.Unbounded (only)
(since they are upwards compatible, and may be useful).

Again, may be useful seems damning with faint praise...

[K4] The additional functions and procedures Replace_Slice,
Insert, Overwrite, could be retained, since (in the absence of
the proposed Slice) they are unlikely to cause problems in
practice.

Even more damning. ;-)

[D1] The additional functions Slice (since they could introduce
ambiguities in existing code that would cause it to fail to
compile).

Yes, this should definitely go away.

[NJR1] The proposed Slice functions could simply be renamed to
Bounded_Slice and Unbounded_Slice, so as to obviate the ambiguity
problem.

This would work, but I don't see the need.

[NJR2] The procedures To_Bounded_String and To_Unbounded_String
are renamed Set_Bounded_String and Set_Unbounded_String.

Now I am really confused!  To me the only reason to add these procedures
that makes any sense is to make it clear that you are creating a new
Bounded (Unbounded) value.  The Set_ nomenclature hides this, while Foo
:= To_Unbounded_String(Bar); is a very reasonable way of  showing the
assignment.

So overall I see NO need to change the Ada.Strings.Bounded and
Ada.Strings.Unbounded packages.  The child IO packages for
Ada.Strings.Unbounded are certainly useful.  I don't see any reason to
add the Ada.Strings.Bounded packages other than symmetry.

****************************************************************

Questions? Ask the ACAA Technical Agent