CVS difference for ais/ai-00301.txt

Differences between 1.4 and version 1.5
Log of other versions for file ais/ai-00301.txt

--- ais/ai-00301.txt	2002/08/29 03:33:21	1.4
+++ ais/ai-00301.txt	2002/08/31 00:32:37	1.5
@@ -741,7 +741,7 @@
 --                                                                          --
 --                                 S p e c                                  --
 --                                                                          --
---    $Revision: 1.4 $                              --
+--                            $Revision: 1.5 $                              --
 --                                                                          --
 --          Copyright (C) 1992-1998, Free Software Foundation, Inc.         --
 --                                                                          --
@@ -1367,6 +1367,339 @@
 
 I find the idea of a child I/O package reasonable. GNAT has provided that
 for some time. I assume the GNAT spec is in hand in this discussion?
+
+****************************************************************
+
+From: Robert Eachus
+Sent: Thursday, August 29, 2002  2:26 PM
+
+Nick Roberts wrote
+
+>I believe the primary rationale for introducing the
+>Ada.Strings.Bounded and Ada.Strings.Unbounded packages into the
+>standard was based on the assumption that compiler implementors
+>could implement these packages more efficiently (using 'insider
+>knowledge' and/or special machine code) than would be possible by
+>writing them in 'pure' (portable) Ada.
+>
+I widely distributed a package that became the basis for
+Ada.Strings.Bounded, and it was about one page of specification a a body
+that was not much longer.  But I think that what a lot of people who
+used that package, or one very similar, missed was that the functions
+and operations that returned (only) String did so for a reason.
+
+The problem is best illustrated without reference to any specific
+package.  If you write Put_Line(A & B & C), you don't want to be
+ambushed by ambiguity.  The problem is that if you define all patterns
+of function "&" for String and Bounded_String, you are lost.  My rule
+was to overload the right parameter so that, in the above if B or C (or
+both) is a Bounded_String it is parsed as (A & B) & C.  If A is a
+Bounded_String, you can write Put_Line("" & A & B & C) or
+Put_Line(To_String(A) & B & C).  An irritation, but much better than the
+disaster you get if you add more overloadings.  The way I did it was
+usable in the presence of multiple used instantiations of Bounded_String.
+
+Now yes, with Ada 95 and beyond, any user who wants to can declare their
+own bounded and unbounded string packages without much work. or extend
+the existing packages.  But there is a huge difference between the
+(trivial) amount of work required, and the deep understanding of Ada
+rules required.  So the packages should stay in the standard.  Anyone
+who doesn't like the choices made can "roll their own" packages.  I
+often do.  But I know what can work and what can't, and most Ada
+programmers don't have that degree of understanding.
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Thursday, August 29, 2002  8:10 PM
+
+Thanks for this clarification Robert (Eachus).
+
+Indeed this makes perfect sense, and I trust the ARG will refrain from
+messing up this carefully thought out design :-)
+
+****************************************************************
+
+From: Randy Brukardt
+Sent: Thursday, August 29, 2002  8:57 PM
+
+Robert Eachus wrote:
+
+> The problem is best illustrated without reference to any specific
+> package.  If you write Put_Line(A & B & C), you don't want to be
+> ambushed by ambiguity.  The problem is that if you define all patterns
+> of function "&" for String and Bounded_String, you are lost.  My rule
+> was to overload the right parameter so that, in the above if B or C (or
+> both) is a Bounded_String it is parsed as (A & B) & C.  If A is a
+> Bounded_String, you can write Put_Line("" & A & B & C) or
+> Put_Line(To_String(A) & B & C).  An irritation, but much better than the
+> disaster you get if you add more overloadings.  The way I did it was
+> usable in the presence of multiple used instantiations of
+> Bounded_String.
+
+Thanks for enlightening us as to (part of) the reason for a bad design. :-)
+
+What you didn't say is why you would want any functions that returned String
+other than To_String. Reading between the lines, I would guess that you were
+trying to add a storage management sort of type which is essentially part of
+String, rather than a new abstraction. (That would explain why your spec.
+was so short.)  In that case, you would intend to do almost all operations
+with operands of type String (which is indeed what happens with
+Ada.Strings.Unbounded).
+
+Of course, somewhere in the 9x process, a lot of processing routines were
+added to the design, which makes it look like Ada.Strings.Unbounded is a
+complete abstraction -- which it is not.
+
+If your design had been for a complete abstraction, you wouldn't expect
+Put_Line (A & B) to work for Unbounded strings (in the absence of an
+Unbounded string I/O package) -- indeed, I wouldn't. I'd expect to have to
+write:
+    Put_Line (To_String (A & B));
+
+So, I wouldn't have any functions that returned String other than To_String.
+I'd allow String operands in as many cases as possible (in order to allow
+string literals), but that would strictly be secondary, and I'd only do it
+where overloading problems aren't possible.
+
+Of course, the presence of the String returning functions in
+Ada.Strings.Unbounded make it impossible to 'fix' it to have a decent
+abstraction. Indeed, since the intent was that it *not* be an abstraction,
+any adding of stuff to it would make it appear even more as a real (but
+broken) abstraction. Thus, I think that not only should we not add anything
+to the package itself, but we shouldn't add any child packages either (as
+the intent is that Ada.Text_IO is good enough as it is).
+
+(Personally, I'm going to adopt Jean-Pierre's rule: use
+Ada.Strings.Unbounded only if you need storage management, and never, ever
+use it in code that is intended to show the elegance of Ada.)
+
+****************************************************************
+
+From: Robert Dewar
+Sent: Thursday, August 29, 2002  9:17 PM
+
+>>Thanks for enlightening us as to (part of) the reason for a bad design. :-)
+
+Well for the record, I prefer the Eachus design to the (virtual) Brukardt
+one :-)
+
+****************************************************************
+
+From: Craig Carey
+Sent: Friday, August 30, 2002  2:17 PM
+
+At 02\08\29 20:56 -0500 Thursday, Randy Brukardt wrote:
+ >Robert Eachus wrote:
+ >
+ >> The problem is best illustrated without reference to any specific
+...
+ >If your design had been for a complete abstraction, you wouldn't expect
+ >Put_Line (A & B) to work for Unbounded strings (in the absence of an
+ >Unbounded string I/O package) -- indeed, I wouldn't. I'd expect to have to
+ >write:
+ >    Put_Line (To_String (A & B));
+ >
+ >So, I wouldn't have any functions that returned String other than To_String.
+
+But that suggestion that the "&" functions return an Unbounded String, is
+seemingly still under the shadow of the "if" in the text "If your design
+had been for a complete abstraction, ...".
+
+Instead the "&" operators can return a plain String.
+It can be concise that way, if some "To_Unbounded_String()" function is
+renamed as "-": e.g.:
+
+   P, Q, R, S : Unbounded_String;
+   ...
+   Output_Unbounded_String (-(P & Q & R & S));
+
+That is very concise and faster and a reduction of the completeness of
+the abstraction. The plain Strings seem so efficient and simple that
+they ought have some ability to make it better to reduce the
+completeness of an abstraction.
+
+
+...
+ >broken) abstraction. Thus, I think that not only should we not add anything
+ >to the package itself, but we shouldn't add any child packages either (as
+ >the intent is that Ada.Text_IO is good enough as it is).
+ >
+ >(Personally, I'm going to adopt Jean-Pierre's rule: use
+ >Ada.Strings.Unbounded only if you need storage management, and never, ever
+ >use it in code that is intended to show the elegance of Ada.)
+ >
+ >
+
+
+
+At 02\08\27 17:58 -0400 Tuesday, Robert Dewar wrote:
+ >> Would there be some sense in the idea of actually removing these
+ >> packages from the next revision of the standard? Obviously they
+ >> would have to remain available in practice, but their ongoing
+ >> specification (and maybe testing) could fall under the
+ >> jurisdiction of something separate from the (main) Ada language
+ >> standard itself.
+ >
+ >It is completely unacceptable to even consider removing useful functionality
+ >from the standard. This package Strings. Unbounded is in wide use and it would
+ >be unthinkable to remove it from the standard. It would give an impression of
+ >a standards process that had run amok!
+ >
+
+When are the compilers going to implement faster Unbounded Strings?.
+That seems to be what doing would imply: eventually at some time,
+the compiler provide faster Unbounded Strings.
+
+
+Persons misled on how standards procedure used to run be informed.
+
+There is a problem with the ":=" operation being slow.
+
+Here are some timing results. The rightmost column is microseconds per
+assignments statement and the strings were 500 bytes long (and there
+was 400 assignments per passage through a "declare" block declaring the
+string variables being assigned:
+
+GNAT 3.14p (-O2 option)
+
+*** Access:  X := Y       :   0.0066
+*** V_Str "Assign_Fast()" :   0.0316
+*** Access:  X.all:=Y.all :   0.3130
+*** V_Str "Assign()"      :   0.3816
+*** V_Str ":="            :   1.7724
+*** Unbounded String ":=" :   1.4299
+
+(Unbounded Str x:=y)/(V_Str Ptr Swap) = 45.25  (=1.4299/0.0316)
+(V_Str "x:=y")/(V_Str "Assign(x,y)")  =  4.645 (=1.7724/0.3816)
+
+ObjectAda 7.2.1 (some no debug option):
+
+*** Access:  X := Y       :   0.0000
+*** V_Str "Assign_Fast()" :   0.0500
+*** Access:  X.all:=Y.all :   0.3000
+*** V_Str "Assign()"      :   0.3497
+*** V_Str ":="            :   1.2003
+*** Unbounded String ":=" :   0.9500
+
+(Unbounded Str x:=y)/(V_Str Ptr Swap) = 19.00  (=0.9500/0.0500)
+(V_Str "x:=y")/(V_Str "Assign(x,y)")  =  3.432 (=1.2003/0.3497)
+
+
+Thus if I avoiding using Unbounded Strings, then a speed improvement
+that could be as much as 20 to 45 times faster, becomes more possible.
+It depends on what fraction of the assignments can be rewritten so
+that swapping with the right hand side being lost. In one program,
+data strings pass through tasks and procedures with most not
+rewriting the data, and many assignments can be rewritten so that
+the pointers in the fully open string records, are swapped.
+
+The ratios 4.645 and 3.432, indicate that strings have to be roughly
+1-7 kilobytes in size before the time lost in copying became similar
+to the time spent in handling overheads associated with a use of ":="
+(when the type is a controlled type). How would compiler writers
+fix that problem of their being too much hidden code being added by
+vendor's compilers. They may be big projects that made a mistake
+with their choice of strings and are intending to rewrite their code.
+(The Apache webserver project is one that got the choice of strings
+wrong the result that the software is slow and it intends to correct
+that by rewriting the string handling code).
+
+The above numbers show that my StriUnli package's ":=" is slower
+than the  Unbounded Strings' ":=".
+
+But that result reverses with Unbounded Strings showing up as worse
+when the fraction of time spent initializing and finalizing both
+types of strings is increased to be half of the maximum possible.
+
+Here are results showing that.
+
+The numbers show microseconds per assignment operation:
+
+GNAT 3.14p
+
+*** Access:  X := Y       :   0.6501
+*** V_Str "Assign_Fast()" :   0.9729
+*** Access:  X.all:=Y.all :   0.7646
+*** V_Str "Assign()"      :   1.5595
+*** V_Str ":="            :   2.4684
+*** Unbounded String ":=" :   3.4704
+
+Aonix 7.2.1:
+
+*** Access:  X := Y       :   0.5051
+*** V_Str "Assign_Fast()" :   0.8499
+*** Access:  X.all:=Y.all :   0.5502
+*** V_Str "Assign()"      :   1.2997
+*** V_Str ":="            :   1.8552
+*** Unbounded String ":=" :   2.5549
+
+The timing had 2 assignments ("x:=y; y:=x;") per finalize [and per
+entry and exit into a 'declare begin end' block]. The string were
+20 bytes long. Timed running in Windows 2000.
+
+The results show that Unbounded Strings are slower. The GNAT code
+seems to be simpler too. I never closely looked into that. But the
+public might not be especially interested in Ada standards when it
+can apparently get more advanced, simpler, up to 40 times faster
+though not optimized, much less secretive strings packages, just
+by downloading a file from some online archive. The public could
+want to have Unbounded Strings dumped and not see a need to argue
+a good case with the vendors that it should be removed instead of
+being improved so that it runs faster.
+
+Java's buffer/buckets strings allows hints to be given on how much
+to allocate.
+
+---
+
+Currently to set the length of the V_Str S, to equal 10, in a low
+level way, I would write this
+
+    Vsr (S).all.Len := 10;
+
+That does not seem too hard to person that is new to Ada, to learn.
+
+[Note that S is directly changed and with Ada, there is no way to
+have a 4th mode that says that a function parameter is seemingly
+constant inside of the function but seemingly "in out" to the view
+of the place where the function is invoked. Wouldn't time be better
+spent on considering a new mode for parameters ?].
+
+
+
+Here is code I used to get the timing results:
+
+  http://www.ijs.co.nz/code/ada95_strings_pkg.zip
+
+****************************************************************
+
+From: Randy Brukardt
+Sent: Friday, August 30, 2002  6:57 PM
+
+> Instead the "&" operators can return a plain String.
+> It can be concise that way, if some "To_Unbounded_String()"
+> function is renamed as "-": e.g.:
+>
+>    P, Q, R, S : Unbounded_String;
+>    ...
+>    Output_Unbounded_String (-(P & Q & R & S));
+>
+> That is very concise and faster and a reduction of the completeness of
+> the abstraction. The plain Strings seem so efficient and simple that
+> they ought have some ability to make it better to reduce the
+> completeness of an abstraction.
+
+Adding more "&" operators would be even more incompatible than what I proposed.
+I'm certain that our "compatibility watchdogs" would raise quite a howl.
+
+Admittedly, a large part of my problem is the verboseness of "To_String" and
+"To_Unbounded_String" when you have to use them in virtually every unbounded
+string expression. A shorter name would be welcome, but I doubt that could be
+done compatibly enough to avoid screwing up existing code.
+
+But, still I find it bizarre that Slice returns a String, and Tail returns an
+Unbounded_String (Tail essentially being a specialization of Slice). Sigh.
 
 ****************************************************************
 

Questions? Ask the ACAA Technical Agent