Version 1.6 of ais/ai-30302.txt

Unformatted version of ais/ai-30302.txt version 1.6
Other versions for file ais/ai-30302.txt

!standard A.17          04-02-13 AI95-00302-04/00
!class amendment 04-02-13
!status work item 04-02-13
!status received 04-02-13
!priority Medium
!difficulty Hard
!subject Container library (mail container)
!summary
This is a dummy AI created solely to hold the volumious mail on this topic. See AI-302-03 for the actual proposal.
!problem
!proposal
!wording
!example
--!corrigendum A.17
!ACATS Test
!appendix

[Editor's note: For mail earlier than February 8, 2004, see AI-302-3.]

****************************************************************

From: Tucker Taft
Sent: Sunday, February 8, 2004  7:33 AM

I suggest the use of controlled types if you want implicit
levels of indirection in the keys or the elements.  Having the
container worry about storage management issues relating to elements
or keys significantly increases their complexity.  We very much
want these containers to be straightforward to define and use.
They are definitely not the final answer, but more the initial
answer -- the 20% that can handle 80% of the problems.

****************************************************************

From: Marius Amado Alves
Sent: Sunday, February 8, 2004  12:23 PM

>I suggest the use of controlled types if you want implicit
>levels of indirection in the keys or the elements.

That is exactly the problem. The user is forced to control. Waste of
time. And bug prone. The right controlled behaviour is very hard to get.
How many times is Finalize called?

>  Having the
>container worry about storage management issues relating to elements
>or keys significantly increases their complexity.

If you mean inneficiency, no, at least not significantly: see the
variant unit solution. If you mean source code complexity, sure, a bit,
but so what?

>  We very much
>want these containers to be straightforward to define and use.
>They are definitely not the final answer, but more the initial
>answer -- the 20% that can handle 80% of the problems.

With only definite elements I don't believe in the 80% figure. Just
think: don't you need heterogeneous arrays all the time? For class-wide
programming for example? And logical records? And words, texts,
pictures, all sort of variable length stuff?

BTW this is the kind of "resistance" I was talking about. No technical
arguments really. Just a vague downsize whish. The pointer tradition maybe.

****************************************************************

From: Marius Amado Alves
Sent: Sunday, February 8, 2004  12:41 PM

Just to make some things clear. I began championing indefinite elements
long ago. Wrote the proposals. They met the "resistance". I let it be. I
assumed the proposals had been viewed and were rejected. The recent
discussion made me wonder if the proposals had really been seen. So I
stepped in just to make sure. I don't want to discuss the issue itself.
That has been done. See the proposals (my Bases document stored in
alternative 1, my proposed Annexes in alternative 2, discussions in
ASCLWG, CLA and here). When I say I won't rediscuss the issue it doesn't
mean I won't give focused explanations here. I'll be glad to do it.
Thanks a lot.

****************************************************************

From: Tucker Taft
Sent: Sunday, February 8, 2004  4:25 PM

> ...
> >  We very much
> >want these containers to be straightforward to define and use.
> >They are definitely not the final answer, but more the initial
> >answer -- the 20% that can handle 80% of the problems.
> >
> With only definite elements I don't believe in the 80% figure. Just
> think: don't you need heterogeneous arrays all the time? For class-wide
> programming for example? And logical records? And words, texts,
> pictures, all sort of variable length stuff?

But in almost all of these cases, I would not want to be copying
these large objects around.  I (as a user of the abstraction) would want
to control storage allocation of the objects.  That would imply
I would be using access types explicitly, or define an abstraction
which used a controlled type, with perhaps reference counting
of a pointed-to part.

> BTW this is the kind of "resistance" I was talking about. No technical
> arguments really. Just a vague downsize whish. The pointer tradition maybe.

Sorry if my arguments seem vague.  I would be happy to engage
in a long discussion about this design choice.  I would want
the container to take over storage allocation only in the
case where it is "uniquifying" the objects, and I expect
to "leave" the objects in the container indefinitely, and pass
around keys (essentially pointers or ids) for the objects.
The example of the "string table" comes to mind, where in
a word or language processing tool, the first thing you do
is uniquify all the strings, and then only deal with indices
into the string table thereafter.  This sort of table generally
never goes away, and just grows slowly as new unique strings occur.

The string mapping was included precisely for this application,
as it seems important and common.  However, for other cases, we
felt it was better to let the programmer control storage allocation,
so that the amount of allocation, copying, and deallocation of large,
variable-sized objects could be minimized, and most importantly,
under control of the user.

Please don't confuse "resistance" with simply a difference of
opinion.  We spend long hours debating incredible minutiae
in the ARG meetings.  We rarely take the "easy" route.
We may not document our discussions publically as well as we
should, but rest assured we have a vigorous debate.
The minutes of ARG meetings, which tend to be very good relative
to most minutes I have seen, are nevertheless able to document
only the "tip of the iceberg" of the discussion.

****************************************************************

From: Marius Amado Alves
Sent: Sunday, February 8, 2004  6:55 PM

Thanks for taking the trouble to review this issue. I'll try to summarize:

You feel the user want to control allocation himself. Sometimes, yes. In
those times, he just does it. The indefinite element feature won't stand
on its way. I feel most of the times the user does NOT want to bother
with memory management. He will love to have indefinite elements. I
think this is the principal difference between us. You think all or most
users prefer to control allocation themselves. I'm conviced they don't,
and they'd be really happy not to have to.

You fear loss of efficiency due to copying. Containers are by-reference,
so you must be referring to copying of elements. But doesn't that happen
just exactly when it has to, be it in the library or in the user code?
Assuming a well designed library, one which moves only references, not
the things, as you yourself notice. I've done proof-of-concept
implementations of this for alternative 2. The process and associated
discussion with Matt was recorded on the ASCLWG forum. The code is still
online I think, but needs cleansing.

****************************************************************

From: Jeffrey Carter
Sent: Monday, February 9, 2004  1:01 AM

Randy Brukardt wrote:
>
> Huh? You've said, in effect, that the performance isn't good enough
> for applications where the performance doesn't matter. That's a
> pretty goofy statement!

Actually, you originally said something like that. You have said

1. That the vector component should only be used by applications where
performance doesn't matter.

2. That the difference in performance between possible implementations
of vector may be critical to applications that use it.

If performance doesn't matter to these applications, then the
restriction on implementations should be removed. However, I agree with
you that even applications that are suitable for the use of standard
components may find the performance difference between different
implementations critical.

> The problem I see is a lot of people are looking far too closely at
> tiny pieces of abstractions.  You might have a queue or a list as
> part of a large abstraction, but they're pretty much useless by
> themselves. And given that creating a queue or stack (both of which
> have only two operations, both trivial!) would take 3 minutes max, it
> makes no sense to use a complex (and necessarily slow) container
> library for just that -- indeed, it probably would be more work to
> use a container than the 3 minutes.

I have seen a number of these "3-min" structures, and many of them have
subtle errors. These are not beginner mistakes, either; handling dynamic
structures seems to be something that a segment of developers have
difficulty understanding. That these structures are not as easy to
implement as they seem is part of the reason why I think a list
component should be part of a standard library.

Regarding Size and Resize, you wrote:

> That's no different than many of the attributes in Ada, which (if set),
> always return the values that they were set to. But what the compiler does
> with those values is (almost) completely implementation-defined.

There is a difference between a compiler directive and an operation of a
package. The latter must have well defined behavior that is not
implementation defined.

> Huh? Resize tells the container a reasonable size to use; what the container
> does with that information is up to it. Size simply returns that
> information.

What does Size return if Resize has not been called?

This description does not agree with the specification in the proposal.
Size "Returns the length of the internal array." Clearly the
implementation must have something that has a length, independent of the
logical length of the value stored in the vector, for Size to return.

Resize "allocates a new internal array whose length is at least the
value Size". Clearly the implemention must allocate a new something with
a new length. What the container does with the new size is not up to it;
it is specified fairly clearly.

The operations, as specified, are pretty meaningless except for an array
implementation.

If the intention is as you described, then the operations appear to be
useless, and should be eliminated. If the intention is as specified,
then these operations are too tied to the implementation, and should be
eliminated.

> I much prefer the vision of this containers library, where the only
> containers included are those that are large, complex, multi-purpose,
> and have a clear abstraction.

The vision I see seems to be muddied. The containers are poorly named,
poorly specified, and confuse abstractions with their implementations.

My intention is to help assure that Ada has as good a container library
as possible in the time available. I assume that the purpose of
presenting the proposal to the Ada-Comment list is to attract comments
on how it could be improved, and there is time to make such comments and
have them considered. I have invested most of this weekend in describing
specific ways I think they could be improved. In many cases I have
provided concrete suggestions for alternative wording, which I
present here. I hope the result will be useful to the committee.

I have already presented my thoughts on changing the type names used to
be consistent with the rest of the standard. I will use the type names
from the proposal here, however, to avoid confusion.

Vectors

The introductory text to Vectors does not make it clear that this is an
extensible array (EA). After reading the package spec, I initially
thought this was a list, perhaps with an unusual implementation. I doubt
if I am special, so I expect such an interpretation from many readers.
After reading the entire section, I encountered the Implementation
Advice that a vector is similar to an array and realized that this was
an EA. An EA is a useful component that I will be happy to see in the
standard.

However, I think it is a disservice to Ada for readers to have to read
the entire section to know what they're looking at. Borrowing from the
introductory text for Strings.Unbounded, which is a special case of an
extensible array, I suggest something along the lines of: "An object of
type Vector_Type represents an array, indexed by Index_Type with
components of Element_Type, whose low bound is Index_Type'First and
whose length can vary conceptually between 0 and the number of values in
Index_type."

The wording used by Strings.Unbounded should serve as a guide to how to
word the text here. Operations in Strings.Unbounded are defined by
analogy to String operations; operations in Vectors should be defined by
analogy to array operations.

Even with such wording changes, however, it is still going to be
difficult for the reader to find what he wants. Someone looking for
vectors is going to be disappointed to find EAs, and someone looking for
an EA is unlikely to look at something named Vectors. Ada should be able
to do better than that. Extensible_Arrays, Flexible_Arrays, and
Unbounded_Arrays have already been suggested by various people here;
given that we already have Unbounded_Strings, Unbounded_Arrays may be
the best choice.

I am not the first to note that Annex A is one of the most accessible
part of the ARM, and is frequently read by those using the standard
library. It makes sense to recognize this and word these sections as a
users' guide where possible. So, if the ARM gains a mathematical library
of matrices and vectors, we should add to it a comment that those
looking for the kind of vector provided by the STL of C++ or Java's
library should look at package Ada.Containers.Unbounded_Arrays (A.17.2).
In the introductory text to the section, we should mention that an
Unbounded_Array is equivalent to the container called Vector in the STL
of C++ or Java's library (similar to the comment about pointers in 3.10).

Index_Subtype is never used, so it should be eliminated.

Size and Resize were discussed above.

First (Vector) is always Index_Type'First, so it should be a constant.

We iterate over an array A by

for I in A'range loop
    -- use A (I)
end loop;

By analogy, we should iterate over an EA by

for I in First .. Last (EA) loop
   -- use Element and Replace_Element at I
end loop;

Front and Back, therefore, seem to be unnecessary, and may be deleted.
This has the additional advantage that it eliminates concern about
Index_Type'Base needing a greater range than Index_type, and we could
remove the assertion.

Writing prematurely when I thought this was a list, I suggested an
iterator for vectors. I retract that suggestion.

It could be useful to provide an operation to add an item at an index >
Index_Type'Succ (Last (Vector) ) without assigning to the intervening
positions. The component doesn't currently allow this. Possible wording:

procedure Append (Vector   : in out Vector_Type;
                   Index    : in     Index_Type;
                   New_Item : in     Element_Type);

If Index <= Last (Vector), this procedure has the same effect as
Replace_Element (Vector, Index, New_Item).

Otherwise, the length of Vector is extended so that Last (Vector) =
Index, and New_Item is assigned to the element at Index. No value is
assigned to the elements at the new positions with indices in
Index_Type'Succ (Last (Vector) ) .. Index_Type'Pred (Index).

There should be some way to indicate that this last use of "Last
(Vector)" refers to the value before the call. I don't see an easy way
to do that and welcome suggestions.

This leaves the problem that Natural is used for the length of a vector
and the counts of inserted or deleted elements, meaning that index types
with more values than Natural cannot use some index values. This is
avoided in Ada.Text_IO, for example, with a type specific for that purpose.

However, this is really a general problem, and a general solution might
be advisable. There are no predefined modular types in Standard, so we
might want to add

type Maximal_Count is mod implementation-defined;

Maximal_Count'Modulus is the largest power of 2 that may be used as the
modulus of a modular type.

We could add a note that this means Maximal_Count'Modulus =
System.Max_Binary_Modulus, for clarity. I presume it would be
inappropriate to reference System in Standard.

If that's not acceptable, we could add somewhere in the hierarchy,
perhaps in package Ada itself

type Maximal_Count is mod System.Max_Binary_Modulus;

[Would we also like subtype Positive_Maximal_Count?]

New packages could then use Maximal_Count rather than Natural for this
sort of thing. Existing packages could be augmented with parallel
operations that use Maximal_Count.

Maps

Maps is fairly well specified. I think the introductory wording should
again be modified: "The user can insert key/value pairs into a map, and
then search for and delete values by specifying the key. An object of
type Map_Type allows searching for a key in less than linear time."

This is a hashed map and specifies an implementation based on a hash
table. This is appropriate, since a hashed map requires the user to
provide a hash function that is not needed by other implementations.
However, I think the name should reflect this (Hashed_Maps) so that we
don't unnecessarily restrain the existence of other forms of Maps.

Since the exact nature of the underlying hash table is implementation
defined, the user doesn't have the information needed to choose an
appropriate size for it. Size and Resize therefore seem inappropriate. I
can hope that users will realize they lack the information to use them
meaningfully, and never call them.

The initial text after the spec seems unnecessarily restrictive of the
implementation. Since the implementation knows best the details of the
hash table, it should determine the initial size of the table.

I agree with the open issue on Swap. I see little use for this operation
on any of the components.

It seems inappropriate to require Insert to resize the hash table. The
implementation should know best when and how to resize the table.

While it's appropriate to discuss nodes as containers of key/value
pairs, it unnecessarily restricts the implementation to talk of nodes
being allocated and deallocated. It should be adequate to say such
things as "Insert adds a new node, initialized to Key and New_Item, to
Map" and "Delete deletes the node from Map".

I don't understand why the string-keyed maps exist, since they are
equivalent to a map with an unbounded string key. The implementation
would have to store the provided key in an appropriate unbounded string,
or duplicate the functionality of unbounded strings. Duplicated
functionality is a bad idea. Moving the conversions to and from
unbounded strings into a special component doesn't seem worth the added
complexity.

Sorted_Sets

The wording here is similar to that for vectors. The introductory text
does not describe the abstraction that the package implements.
Proceeding to the package spec, the reader will probably be puzzled by
the lack of basic set operations such as union and intersection. The
description of the operations that follows does nothing to alleviate the
confusion. A newcomer to the language may very well wonder what's wrong
with these Ada people. Only at the very end of section do we discover
that this is a structure that provides searching in O(log N) time.

Clearly the choice of Set as the name is confusing and misleading, but
I'm not sure what to suggest as an alternative. Something like
Fast_Search seems to imply that it is an algorithm, not a structure.
Perhaps Sorted_Searchable_Structure would work, but I'm not very happy
with it. Suggestions are welcome.

The introductory text needs to identify what the component is: "An
object of type Searchable_Structure represents a data structure that can
be searched in less than linear time."

Given that this is a searchable structure, the operations seem reasonable.

The descriptions of the operations clearly require an implementation
that performs dynamic allocation and deallocation. This is an
unnecessary constraint on the implementation. A binary search is O(log
N), but is not allowed by the current specification. These descriptions
should be modified along similar lines to the suggestions for maps.

If the package does not use "=" for elements, why does it import it? Why
doesn't the package use "="? It's not clear why it should use
"equivalence" rather then equality.

The package Ger_Keys turns a searchable structure into a map. A
searchable structure is a common implementation of a map. Providing an
alternive implementation of a map seems is fine, provided that the name
indicates that it is a map. Sorted_Map might be a better name.

It's quite easy to implement a map with a searchable structure
component, so it would be better if the map was another component at the
same level as the hashed map. I would have no objection to the standard
specifying that this map be implemented with an instantiation of the
searchable structure component; it would make the specification of the
map easy. The primary justifications for this change are that it allows
the user who wants a map based on a searchable structure to obtain it
with a single instantiation, rather than the two required as it stands,
and it allows both maps to have similar interfaces, which they do not
have with the existing proposal.

I'm glad the proposal recognizes that both searchable structures and
maps based on them are useful components, even if they go to great
efforts to disguise what they are.

This discussion of the searchable structure and the map based on it
seems to indicate a basic design problem with the hashed map component.
A hash table is not trivial to implement correctly. There are uses for
hash tables other than maps. As it stands, the user who wants a hash
table must create one, duplicating the effort performed for the map, and
increasing the likelihood of errors.

Just as both a searchable structure and a map based on it are desirable,
so both a hash table and a map based on it would be a good idea. The
user who requires a hash table but not a map could use one that has been
tested by many users, reducing both effort and likelihood of errors.
Thus I suggest that the hash table be turned into a component. As with
the map based on a searchable structure, I would have no problem with
the standard specifying that the hashed map be implemented using the
hash table component.

If we can only have one of the hash table or the hashed map components,
I would argue for the hash table, since it is easy to implement a map
given a hash table, but difficult to implement a hash table given a map.

Providing maps based on other packages allows the standard to
demonstrate a layered approach to creating abstractions. Since creating
useful abstractions is a basic process in software engineering, perhaps
the idea might rub off on some readers.

If this suggestion is accepted, the library would increase from three to
five: an extensible array, a hash table, a searchable structure, a map
based on the hash table, and a map based on the searchable structure.
That still seems a fairly minimal library, provides the same
functionality as the proposal, and adds some additional useful
functionality without significant extra effort.

****************************************************************

From: Martin Krischik
Sent: Monday, February 9, 2004  5:40 AM

> And, on most implementations, I would expect it to make it *many* times
> slower. (It wouldn't have any effect on Janus/Ada, I don't think, because we
> already have to allocate an element at a time anyway.) I would guess that it
> is that efficiency concern that Matt is responding to. But I'll let him
> respond himself...

Actualy some operation will become faster. Like instet in the midle. Also
append operation which need to extend internal storrage become faster.

At least when the stored data is larger then an access - which should be
80% of the cases.

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  8:39 AM

Randy Brukardt wrote:

> Huh? Resize tells the container a reasonable size to use; what the container
> does with that information is up to it. Size simply returns that
> information.

It returns the value chosen by the implementation, which can be at least
the size specified.


> The only real requirement here is O(1) element access (which prevents the
> use of a straight linked list).

Yes, that is correct: you cannot use a linked list to implement a vector.

Indeed, if a vector container were implemented as a linked list then it
wouldn't be named "vector"; it would be named "linked list" instead.

My original proposal had 3 kinds of sequence containers: vectors,
deques, and (linked) lists.  There were 3 because each has different
time and space properties.

I would have liked having a list container in the final committee
report, since that's the most natural container for use as a queue.  (I
probably use lists more often than any other container, for exactly that
reason.)  But the size of the proposal had to be reduced somehow.


> Janus/Ada will probably use an array of pointers (or possibly array of
> arrays of pointers); we're going to be (implicitly) allocating the elements
> anyway, we might as well do it explicitly and take advantage of that to make
> Insert/Delete/Sort (and any expansions) much cheaper (presuming the elements
> are bigger than scalar types). An array of arrays of pointers is even
> better, because insertion cost is bounded by the maximum size of an array
> chunk -- but there is more overhead and complexity, so I'd like to see some
> real uses before deciding on an implementation.

My reference implementation just uses an unbounded array internally.  It
sounds like you have some other implementation ideas.

I have the maps done, and I'll host the new reference implementation
this morning (Mon, 9 Feb).


> Note that a pure list component has no real opportunity for "better"
> implementations, and indeed, any implementation on Janus/Ada would suffer
> from "double" allocation.

But a list component has O(1) insertion and deletion at any position.  A
vector is O(1) only at the back end.

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  9:09 AM

Martin Dowie wrote:

>>The only sequence container in the proposal is a vector, which doesn't
>>have a passive iterator.  Again, I recommend just using a loop:
>
> I suspect the first thing I will do is add an extra child generic subprogram
> Ada.Containers.Vectors.Iterate! :-)

You might not have to.  Since there seems to be interest, I added the
following two declarations to the reference implementation:


    generic
       with procedure Process
         (Element : in Element_Type) is <>;
    procedure Generic_Constant_Iteration
      (Vector : in Vector_Type);

    generic
       with procedure Process
         (Element : in out Element_Type) is <>;
    procedure Generic_Iteration
      (Vector : in Vector_Type);


The latest version of the reference implementation is available at my
home page:

<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040209.zip>

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  9:14 AM

Martin Krischik wrote:

>>The user can easily code a queue in terms of a Vector (that's one of the
>>uses of Insert!). We dropped the list component because it had an identical
>>interface to the Vector component, but was less flexible (no computed O(1)
>>access).
>
> True enough. But if you wanted a build generic queue on top of the vector the
> tag should not be hidden from view. Otherwise one need to repeat all the
> access methods instead of just renaming the one provided from the parent
> package.
>
> In fact the hidden tag is the one feature which I realey dislike in charles.

You mean the type tag?  The components are tagged because I needed
controlledness for automatic memory management. They are tagged for no
other reason, and Charles is specifically designed using static, not
dynamic, polymorphism.

For the record I don't think it's realistic to use a vector as a queue
anyway, since deletion from the front end of a vector is O(n).

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  9:22 AM

Martin Krischik wrote:

> Passive Iterators should allways provide the fastes mean to iterate over the
> whole container. They should do so by knowing the internals of the container.

That is correct.  A passive iterator will usually beat an active
iterator.  But for a vector it probably doesn't make any difference.

However, the latest reference implementation does have passive iterators
for the vector, that look like this:

    generic
       with procedure Process
         (Element : in Element_Type) is <>;
    procedure Generic_Constant_Iteration
      (Vector : in Vector_Type);

    generic
       with procedure Process
         (Element : in out Element_Type) is <>;
    procedure Generic_Iteration
      (Vector : in Vector_Type);


The latest version of the reference implementation is available at my
home page:

<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040209.zip>

> Of course it only matters in advanced container with B-Trees or AVL-Trees as
> as internal structure. But I have only seen those in IBM's Open Class Library
> (which is far better the the STL).
>
> But there are no advanced containers in AI 302.

The sorted set is implemented using a balanced tree.  The reference
implementation uses a red-black tree, but I suppose an AVL tree would
work too.

The maps are implemented using a hash table.

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  9:30 AM

Stephen Leake wrote:

> What is the rationale for making the Map Key_Type definite, as opposed
> to indefinite? Since an indefinite Key_Type is required for
> Containers.Maps.Strings, why not make that capability available to the
> users?

Because that would punish users that have definite key types.

Also, type String isn't just any indefinite type.  It's an array.

The reference implementation for String_Maps looks like this:

    type Node_Type;
    type Node_Access is access Node_Type;

    type Node_Type (Key_Length : Natural) is
       record
          Key     : String (1 .. Key_Length);
          Element : aliased Element_Type;
          Next    : Node_Access;
       end record;

> I don't see a discussion of this in AI-302-03/01.

There is a paragraph in there explaining why we have a dedicated maps
whose key type is String.


> Another point: Containers.Vectors.Size should return Index_Type'Base,
> and the Size parameter in Resize should also be Index_Type'Base. It's
> confusing to have different types for Size and Index.

No.  The parameter of the Resize operation specifies a hint about the
future length of the container, which is subtype Natural.

> There's also a problem if Natural'Last < Index_Type'Last; you
> can't have a vector that contains every index!

The assumption is that a container will always have fewer the
Integer'Last number of elements.  (On a 32 bit machine that's 4.2
billion values...)

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  9:34 AM

Randy Brukardt wrote:

> We definitely expect that the strings container will use a purpose-built
> data structure for storing strings, not some general indefinite item
> capability. Ways to compactly and efficiently store sets of varying size
> strings are well known and commonly used.

I didn't do anything special here.  The internal node declaration for
String_Maps looks like this:


    type Node_Type;
    type Node_Access is access Node_Type;

    type Node_Type (Key_Length : Natural) is
       record
          Key     : String (1 .. Key_Length);
          Element : aliased Element_Type;
          Next    : Node_Access;
       end record;


I have hosted the latest version of the reference implementation at my
home page:

<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040209.zip>

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  9:49 AM

Randy Brukardt wrote:

>>There's also a problem if Natural'Last < Index_Type'Last; you
>>can't have a vector that contains every index!
>
> Yes, that's a serious problem on Janus/Ada (Integer is 16-bit). However, you
> want the Size and Resize operations to take a numeric type that contains
> zero -- and certainly Index_Type is not that. Index_Type could be a subtype
> of an enumeration type or a subtype of a modular type (neither of which can
> contain zero) or a subtype of an integer type not containing zero.
>
> We had a short, inconclusive discussion about whether the index type ought
> to be range <> rather than (<>) (because enumeration and modular types fail
> the assertion and thus aren't directly usable), but that still doesn't
> guarantee a zero. Moreover, if the integer type has negative numbers, then
> the Length of the vector could be larger than Index_Type'Last.

Clearly, if the container is empty, and Index_Type'Base'First =
Index_Type'First, then evaluation of function Last will raise
Constraint_Error.

The issue is whether elaboration of a vector container object can raise
CE if the Index_Type'Base'First = Index_Type'First.

There's no reason why we should punish users whose generic actual index
subtype has Index_Type'Base'First = Index_Type'First, since they can
always defend against CE like this:

    if not Is_Empty (V) and then Last (V) = X then

In fact my reference implementation doesn't require that
Index_Type'Base'First < Index_Type'First, so the assertion in the spec
is somewhat spurious.

I would prefer to weaken the precondition and allow
Index_Type'Base'First = Index_Type'First, but it's really up to
implementors, because allowing that condition will constrain
implementation choices.

> So I don't see a great solution. I wondered about using "Hash_Type" here (it
> has the correct properties), but that seems like a misuse of the type (and a
> bad idea in a library that most Ada programmers will read - you want to show
> them good style in standard libraries).

As I mentioned in my previous message, Resize specifies a hint about the
future number of elements in --that is, the length of-- the container.
My assumption is that no container will ever have more than Integer'Last
number of elements.

If that assumption is incorrect, then maybe the container can be allowed
to grow internally to more than Integer'Last number of elements, but can
only report a maximum value of Integer'Last.

Subtype Natural is the correct choice for the vector Resize operation.

I think the ARG wants to use Hash_Type for Resize for the maps.  My
reference implementation still uses Natural.

****************************************************************

From: Robert A. Duff
Sent: Monday, February 9, 2004  4:40 PM

> Clearly, if the container is empty, and Index_Type'Base'First =
> Index_Type'First, then evaluation of function Last will raise
> Constraint_Error.

Well, some might think it's clear, but some might think Last returns
First-1, which for a modular type is 'Last.  I'm in favor of making the
Index_Type be "range <>", and also requiring that elaboration of an
instance raise an exception if 'First = 'Base'First.  That would avoid
all these anomalies.

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  6:24 PM

That seems reasonable.  It was questionable whether we really needed

  type Index_Type is (<>);

so maybe these issues will require that

  type Index_Type is range (<>);

This is probably good enough.

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  9:53 AM

Randy Brukardt wrote:

> So, a passive iterator will only be faster in complex containers (where you
> have to separate the  Element and Successor functions). For a Vector (where
> the language already has the needed iteration mechanism built-in), it's
> going to be slower (or, if you're really lucky, the same speed) and it
> certainly is a lot harder to write.
>
> So I think having it on Vector would simply be for consistency; you'd never
> actually use it if you know you're dealing with a Vector.


As I mentioned in one of my previous messages, the reference
implementation now has a passive iterator like this:

    generic
       with procedure Process
         (Element : in Element_Type) is <>;
    procedure Generic_Constant_Iteration
      (Vector : in Vector_Type);

    generic
       with procedure Process
        (Element : in out Element_Type) is <>;
    procedure Generic_Iteration
      (Vector : in Vector_Type);


There seems to be interest in a passive iterators for vectors, so we
might as well include it.

<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040209.zip>

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  10:00 AM

Randy Brukardt wrote:

> At which point, you *equal* the performance of the active iterator. And only
> if *everything* goes right. The OP claimed that the passive iterator would
> always have better performance, and that's certainly not true for the vector
> container. I doubt that it would be true for the Map container, either. It
> could be true for a complex container, but those aren't commonly used.

The vector is arguably a borderline case, but we should just include a
passive iterator.  The latest version of the reference implementation
has them for vectors, too.

<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040209.zip>

For both a (hashed) map and (sorted) set, a passive iterator is likely
to beat an active iterator (other things being equal, of course).

For a map, the reason is that you can just use a loop internally, to
keep track of which bucket you're visiting.  In an active iterator, you
have to compute the hash value again to find the next bucket.

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  10:15 AM

>I suspect the first thing I will do is add an extra child generic
>subprogram Ada.Containers.Vectors.Iterate! :-)

This probably won't be necessary.  I added passive iterators to the
vector reference implementation.

<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040209.zip>

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  10:13 AM

> And, on most implementations, I would expect it to make it *many* times
> slower. (It wouldn't have any effect on Janus/Ada, I don't think, because we
> already have to allocate an element at a time anyway.) I would guess that it
> is that efficiency concern that Matt is responding to. But I'll let him
> respond himself...

The reason is that (in what I imagine is a typical implementation)
allowing the key to be indefinite would have drastic performance
implications.

The internal node of the map reference implementation looks like this:


    type Node_Type;
    type Node_Access is access Node_Type;

    type Node_Type is
       record
          Key     : aliased Key_Type;
          Element : aliased Element_Type;
          Next    : Node_Access;
       end record;

I can declare the key as a record component directly, because the formal
key type is definite.  Were we to allow indefinite key types, then we
would have to do something like:

    type Node_Type;
    type Node_Access is access Node_Type;

    type Key_Access is access Key_Type;

    type Node_Type is
       record
          Key     : Key_Access;
          Element : aliased Element_Type;
          Next    : Node_Access;
       end record;

which implies allocating the key object separately from allocation of
the node itself.  This would unfairly punish users that have a definite
actual key type (as Integer or whatever).

<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040209.zip>

If you want an indefinite key type, then allocate the key object
yourself and instantiate the component using the key access type.  This
shouldn't be a problem since the map object is typically part of some
higher-level abstraction anyway, so you can hide the allocation and map
manipulation from the users of that higher-level abstraction.

See the !examples section of the proposal for more details.

****************************************************************

From: Simon J. Wright
Sent: Monday, February 9, 2004  11:37 AM

> The internal node of the map reference implementation looks like this:

Does the aliasing of Element carry any implications for Element_Type?
I am thinking of the use of discriminated types, even with defaulted
discriminants, where aliasing forces the object to be constrained.

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  11:48 AM

It means you can't instantiate the container using a
default-discriminated element type.

This is the same problem you have when trying to declare a
default-discriminated record on the heap, or as aliased on the stack.

The solution in all cases is to use a wrapper type sans discriminant,
and instantiate the component using the wrapper type as the element type.

****************************************************************

From: Robert A. Duff
Sent: Monday, February 9, 2004  2:40 PM

This seems like a real issue.  Either the AI needs to specify that
default-discriminated record "don't work", as it were, or the
implementation needs to do the record-wrapping.

Tucker and I have run into this issue in our current project (I think I
wrote a container package, and Tucker instantiated it like that!), and it
wasn't entirely obvious what the best solution was.

****************************************************************

From: Gary Dismukes
Sent: Monday, February 9, 2004  2:49 PM

> It means you can't instantiate the container using a
> default-discriminated element type.

Not stated quite right -- you can instantiate the container with
such a type, but it might not work right.  You might get mysterious
exceptions propagating out of operations if the implementation
reassigns to an Element component in a node.

> This is the same problem you have when trying to declare a
> default-discriminated record on the heap, or as aliased on the stack.
>
> The solution in all cases is to use a wrapper type sans discriminant,
> and instantiate the component using the wrapper type as the element type.

I think that's not an acceptable answer in this case.  These aliased
element components are part of the implementation.  The user shouldn't
need to know about them and it's an abstraction violation in my opinion
if the user is forced to wrap his element type.  Instead it would seem
that the implementation has to do that wrapping.  Ugly, but at least
it keeps the ugliness internal to the container implementation.

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  2:57 PM

>>The solution in all cases is to use a wrapper type sans discriminant,
>>and instantiate the component using the wrapper type as the element type.
>
> This seems like a real issue.  Either the AI needs to specify that
> default-discriminated record "don't work", as it were, or the
> implementation needs to do the record-wrapping.

The problem is that the element type is aliased.  Wrapping it internally
won't work because Generic_Element returns an access object that
designates the element, not the wrapper.

You can't satisfy both conditions simultaneously.  Personally I find
in-place modification of elements much more useful than being able to
store (unwrapped) default-descriminated elements.

One compromise solution is to only disallow instantiation of
Generic_Element, rather than the whole package, if the element type has
a default-discriminant.  But I don't know whether this is possible
within the language.

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  9:31 AM

>>The solution in all cases is to use a wrapper type sans discriminant,
>>and instantiate the component using the wrapper type as the element type.
>
> I think that's not an acceptable answer in this case.  These aliased
> element components are part of the implementation.  The user shouldn't
> need to know about them and it's an abstraction violation in my opinion
> if the user is forced to wrap his element type.  Instead it would seem
> that the implementation has to do that wrapping.  Ugly, but at least
> it keeps the ugliness internal to the container implementation.

That won't work.  Generic_Element returns an access value that
designates an object of type Element_Type, not the internal wrapper
type.  The problem is that objects of (default-discriminated)
Element_Type can't be aliased, so I'm not allowed to say Element'Access.

Perhaps there is some other solution.  I'm not really sure...

****************************************************************

From: Gary Dismukes
Sent: Monday, February 9, 2004  3:47 PM

Matt Heaney wrote:
>
> That won't work.  Generic_Element returns an access value that
> designates an object of type Element_Type, not the internal wrapper
> type.  The problem is that objects of (default-discriminated)
> Element_Type can't be aliased, so I'm not allowed to say Element'Access.

True, that's a problem.

> Perhaps there is some other solution.  I'm not really sure...

Another solution is to use 'Address and unchecked conversion
to the access type, and forget the aliased component.  This is
starting to look unpleasant though :-(

What we really need is something like Tucker's proposal in AI-363
(eliminating access subtype problems), which would prevent this
pesky aliased problem altogether...

****************************************************************

From: Randy Brukardt
Sent: Monday, February 9, 2004  4:01 PM

Right. And that's still on the table, so there may ultimately be no problem
here for Ada 200Y.

****************************************************************

From: Simon J. Wright
Sent: Tuesday, February 9, 2004  3:16 AM

The Booch Components use Address_To_Access_Conversions for this
precise purpose.

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  4:07 PM

Indeed.  What I was trying to do with Generic_Element is something
similar to what you have in C++:

{
    std::vector<int> v;

    v.push_back(42);

    int& i = v.back();

    ++i;  // i becomes 43
}

The problem is that we don't have references in Ada.  But even so you
can do something like this:

type Integer_Access is access all Integer;

function To_Access is
    new Integer_Vectors.Generic_Element (Integer_Access);

declare
    V : Integer_Vectors.Vector_Type;
begin
    Append (V, New_Item => 42);

    declare
       I : Integer renames To_Access (V, Last (V)).all;
    begin
       I := I + 1;   -- I becomes 43
    end;
end;

This works but the model breaks if the element type has a default
discriminant.

In the case of Integer it is perhaps not necessary to use this
mechanism, but consider if the element of the container is another
container.  You need a variable view of the container element in order
to manipulate it.

I wish there some other way, something like:

   function Element (V : VT) return Element_Type'Reference;
   --in the pseudo vectors pkg

declare
    V : Integer_Vectors.Vector_Type;
begin
    Append (V, New_Item => 42);

    declare
       I : Integer renames Element (V, Last (V));
    begin
       I := I + 1;
    end;
end;

Here Element_Type'Reference is some kind of virtual type that is limited
and indefinite.  The only thing you're allowed to do with the the value
returned by a function that returns T'Reference is to rename it.

But perhaps the ARG has some other, more elegant technique.  Just food
for thought...

****************************************************************

From: Tucker Taft
Sent: Monday, February 9, 2004  5:50 PM

Gary Dismukes wrote:
> ...
> I think that's not an acceptable answer in this case.  These aliased
> element components are part of the implementation.  The user shouldn't
> need to know about them and it's an abstraction violation in my opinion
> if the user is forced to wrap his element type.  Instead it would seem
> that the implementation has to do that wrapping.  Ugly, but at least
> it keeps the ugliness internal to the container implementation.

I agree.  Just declare a local record type that wraps the
user's type.  And/or hope that the AI that solves this
problem gets accepted.

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  10:18 AM

Tucker Taft wrote:

> I suggest the use of controlled types if you want implicit
> levels of indirection in the keys or the elements.  Having the
> container worry about storage management issues relating to elements
> or keys significantly increases their complexity.  We very much
> want these containers to be straightforward to define and use.
> They are definitely not the final answer, but more the initial
> answer -- the 20% that can handle 80% of the problems.

Ahhhh, the voice of reason.  This is exactly right.

If you want indefinite key types, then you pay that privilege, by having
to do the memory management of indefinite keys yourself.  This is how it
should be.

****************************************************************

From: Martin Krischik
Sent: Monday, February 9, 2004  12:40 PM

But you could not even strore a collection of strings. Ok, there are unbounded
strings. But storing 'Class thats the killer feature. If Ada.Containers can't
do it I am not interested. The will be no 20%/80% split. Its 0% - I won't us
them.

****************************************************************

From: Marius Amado Alves
Sent: Monday, February 9, 2004  12:36 PM

Sounds more like the voice of the Devil, or at least De Sade, to me.
"Want indefinite? Go do memory management!" Too much pointer programming
in your minds, dudes. No doubt from much systems programming in your
resum‚s, but you forget not everybody is a systems programmer. For an
application programmer that 80% figure is just so wrong.

(Matt, "this is exactly right", "this is how it should be"? Assertive is
good but now you're sounding like some God (or Devil). I thought you
were an ateist ;-)

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  12:58 PM

Ada is a low-level systems programming language.  It gives you the tools
to build higher-level abstractions.

If you need to store elements whose type is indefinite, then you have to
build that abstraction yourself, perhaps using the low-level containers
as a substrate.

As Tucker stated, the containers are the starting point, not the ending
point.  Certainly, building the higher-level abstraction is much easier
with the low-level containers than without.

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  12:53 PM

> But storing 'Class thats the killer feature. If Ada.Containers can't
> do it I am not interested. The will be no 20%/80% split. Its 0% - I won't
> use them.

The library is designed around the common case, which means definite key
and element types.

If you want to store elements of type T'Class, that you have to use an
access type to instantiate the component, and then do the memory
management of elements yourself.

This is how it should be.

****************************************************************

From: Pascal Obry
Sent: Monday, February 9, 2004  1:15 PM

 > Ada is a low-level systems programming language.  It gives you the tools
 > to build higher-level abstractions.

As you seem to like strong arguments, let me try this:

This is plain wrong :) Ada is not low-level and certainly not a system
programming language. Ada is an high level language without a specific
domain, this is my point of view.

I find really strange that only Vector is being considered for example. It
would be really useful to have queue, list and stack. Now limiting the
containers to definite types is another restrictions...

The idea behind the Ada containers was to have a common set of useful
components for Ada to avoid reinventing the wheel... So the argument
"If you need to store elements whose type is indefinite, then you have to
 build that abstraction yourself" sounds boggus to me ;)

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  1:26 PM

You can use a vector as a stack.  The library doesn't need to provide a
stack directly.

The library does not provide a list.  I wish it had a list, but the
subcommittee had to reduce the scope of the library and so list didn't
make the cut.

You can use a list as a queue.  The library doesn't need to provide a
queue directly.  However, the library doesn't provide a list, so it
doesn't provide a queue either.

Note that if you need a priority queue, you can use the sorted set.  The
library doesn't need to provide a priority queue directly.


> The idea behind the Ada containers was to have a common set of useful
> components for Ada to avoid reinventing the wheel... So the argument
> "If you need to store elements whose type is indefinite, then you have to
>  build that abstraction yourself" sounds boggus to me ;)

I didn't mean that you have to build the component from scratch.  I
meant only that you have to do the memory management of indefinite
elements yourself.  The higher-level component that you build can be
implemented using the low-level containers.

Real systems are built from the bottom up.  All we did was to provide
the lowest level in the abstraction hierarchy.

****************************************************************

From: Pascal Obry
Sent: Monday, February 9, 2004  2:00 PM

Matthew,

 > You can use a vector as a stack.  The library doesn't need to provide a
 > stack directly.

Except that a stack should have a far more limited set of operations. This
ensure that the stack abstraction is not worked-around.

 > The library does not provide a list.  I wish it had a list, but the
 > subcommittee had to reduce the scope of the library and so list didn't
 > make the cut.

I really think that this should be reconsidered. A list is the most used
abstraction in many software I have built/seen.

 > You can use a list as a queue.

Of course but again this is wrong in my view. The abstraction should be
constrained to the set of operations for a queue. In that case why not remove
the vector, it can be implemented easily with a map, the key is the index of
the item in the array :)

 > Note that if you need a priority queue, you can use the sorted set.  The

This is more high level component, I agree that it is ok to not include it.

If we miss some important components in the standard container library what we
will do ? Use another component library like Charles or PragmArc... an not use
the standard container library... so what the point ????

The most important point in a container library is *completeness* I would
say. This is exactly what STL has done.

****************************************************************

From: Martin Krischik
Sent: Monday, February 9, 2004  12:16 PM

> If you want an indefinite key type, then allocate the key object
> yourself and instantiate the component using the key access type.  This
> shouldn't be a problem since the map object is typically part of some
> higher-level abstraction anyway, so you can hide the allocation and map
> manipulation from the users of that higher-level abstraction.

But Ada hasn't got a garbage collector so there is the deallocation problem.
Especialy when the container copied or passed around.

And Ada (unlike C++) can to better! With Ada you can have a container with
indefinite types where with C++ you can't. We should not give away that
advantage.

****************************************************************

From: Marius Amado Alves
Sent: Monday, February 9, 2004  1:07 PM

> Ada is a low-level systems programming language.  It gives you the
> tools to build higher-level abstractions.

Ok. Thanks for recentring the argument. So your position is that the
standard should not give high-level facilities. Personally I see Ada's
doom in that position. A stillborn Ada 2005.

****************************************************************

From: Pascal Obry
Sent: Monday, February 9, 2004  2:03 PM

Sadly, I feel alike :(

****************************************************************

From: Stephen Leake
Sent: Monday, February 9, 2004  1:56 PM

> If you want indefinite key types, then you pay that privilege, by
> having to do the memory management of indefinite keys yourself.  This
> is how it should be.

Ok. I'd like to see that rationale documented in the final version of
the AI, so people understand why Ada.Containers.String_Map isn't
simply an instantiation of Ada.Containers.Map.

One more argument for indefinite keys; if a C++ person looks at this,
they can say "Ada generics are so weak they can't even allow a String
as a key!". Not good for the "let's attract more users" goal.

And I will continue to use SAL, where the containers do the memory
management, because I like that design point better :).

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  2:13 PM

Stephen Leake wrote:

> One more argument for indefinite keys; if a C++ person looks at this,
> they can say "Ada generics are so weak they can't even allow a String
> as a key!". Not good for the "let's attract more users" goal.

But you can't do that in C++, either.  Indeed, C++ doesn't have
indefinite types so it's unlikely a C++ programmer would even think to
ask that question.


> And I will continue to use SAL, where the containers do the memory
> management, because I like that design point better :).

Real systems are built from the bottom up.  All we did was to provide
the lowest-level in the abstraction hierarchy.

****************************************************************

From: Stephen Leake
Sent: Monday, February 9, 2004  4:20 PM

> But you can't do that in C++, either.  Indeed, C++ doesn't have
> indefinite types so it's unlikely a C++ programmer would even think to
> ask that question.

Hmm. To be specific;

can a C++ STL Map be instantiated with a C++ STL String as the Key?

I'll have to check, but I bet the answer is "yes".

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  6:27 PM

Yes of course an STL map can be instantiated with type std::string as the
key, but that type is analogous to Ada's Unbounded_String, not String.

> I'll have to check, but I bet the answer is "yes".

Yes it can, but you're comparing apples and oranges.

****************************************************************

From: Stephen Leake
Sent: Monday, February 9, 2004  8:36 PM

Ok. And Ada.Containers.Map can be instantiated with Unbounded_String
as the Key. Good enough.

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  2:22 PM

Pascal Obry wrote:

> Except that a stack should have a far more limited set of operations. This
> ensure that the stack abstraction is not worked-around.

Fine.  Then you can implement that stack abstraction yourself, using a
vector as the implementation.

> I really think that this should be reconsidered. A list is the most used
> abstraction in many software I have built/seen.

I think so too, but the subcommittee had to reduce the scope of the
proposal and so lists didn't make the cut.

If you ask for too much then you might not get anything.

> Of course but again this is wrong in my view. The abstraction should be
> constrained to the set of operations for a queue.

Fine.  Then you can implement that queue abstraction yourself, using a
list as the implementation.

>In that case why not remove
> the vector, it can be implemented easily with a map, the key is the index of
> the item in the array :)

That would be an example of "abstraction inversion": using a
higher-level abstraction to implement a more low-level one.

This is the mistake they made in Ada83, requiring that high-level tasks
be used to implement low-level synchronization constructs as semaphores
and monitors.

Ada is a low-level systems programming language.  It is not Perl.

> If we miss some important components in the standard container library what we
> will do ? Use another component library like Charles or PragmArc... an not use
> the standard container library... so what the point ????

Do whatever you're doing now.

The intent of the committee is that this small, modest set of containers
will provide the impetus for a secondary standard.

> The most important point in a container library is *completeness* I would
> say. This is exactly what STL has done.

Well, my original proposal included all the containers in the STL and
then some.  So don't blame me!

****************************************************************

From: Pascal Obry
Sent: Monday, February 9, 2004  2:52 PM

 > Fine.  Then you can implement that stack abstraction yourself, using a
 > vector as the implementation.

Of course, I also can implement every thing myself :)

 > Fine.  Then you can implement that queue abstraction yourself, using a
 > list as the implementation.

Of course, I also can implement every thing myself :)

 > That would be an example of "abstraction inversion": using a
 > higher-level abstraction to implement a more low-level one.

As it is to implement a stack over a vector abstraction.

 > Ada is a low-level systems programming language.  It is not Perl.

It is not Perl, but it is not either a low-level systems programming
language :) And yes I'll keep repeating this :)

 > Do whatever you're doing now.

But I don't !!! That's the whole point of the container library.

 > The intent of the committee is that this small, modest set of containers
 > will provide the impetus for a secondary standard.

Ok. That's a point.

 > > The most important point in a container library is *completeness* I would
 > > say. This is exactly what STL has done.
 >
 > Well, my original proposal included all the containers in the STL and
 > then some.  So don't blame me!

I know Matthew and I want to thanks you for the hard work. I just expected a
bit more so I'm frustrated :)

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  2:00 PM

Martin Krischik wrote:

> But Ada hasn't got a garbage collector so there is the deallocation problem.
> Especialy when the container copied or passed around.

You are responsible for memory management of the indefinite elements.
Implement your high-level abstraction using the low-level container,
instantiated with an access type.

> And Ada (unlike C++) can to better! With Ada you can have a container with
> indefinite types where with C++ you can't. We should not give away that
> advantage.

There is only a slight difference here between Ada95 and C++.  In Ada95
you can do this:

    procedure Insert (C : in out CT; E : in ET) is
       EA : constant ET_Access := new ET'(E);
    begin
       ...

This will work even if ET is indefinite.

In C++ the type has to have a clone operator or whatever:

    void insert(const e_t& e)
    {
       e_t* const pe = e.clone();
       ...
    }

Internally the components wouldn't be any different.

****************************************************************

From: Stephen Leake
Sent: Monday, February 9, 2004  2:04 PM

Matthew Heaney <mheaney@on2.com> writes:

> Stephen Leake wrote:
>
> > What is the rationale for making the Map Key_Type definite, as opposed
> > to indefinite? Since an indefinite Key_Type is required for
> > Containers.Maps.Strings, why not make that capability available to the
> > users?
>
> Because that would punish users that have definite key types.

Can you elaborate on this? I don't see it.

> Also, type String isn't just any indefinite type. It's an array.
>
> The reference implementation for String_Maps looks like this:
>
>     type Node_Type;
>     type Node_Access is access Node_Type;
>
>     type Node_Type (Key_Length : Natural) is
>        record
>           Key     : String (1 .. Key_Length);
>           Element : aliased Element_Type;
>           Next    : Node_Access;
>        end record;

Obviously you can optimize a container if you know the specific types
involved. But the standard containers aren't supposed to be about
highly optimized code; they are supposed to be about generally useful
code.

> > I don't see a discussion of this in AI-302-03/01.
>
> There is a paragraph in there explaining why we have a dedicated maps
> whose key type is String.

Yes. It does _not_ say why Ada.Containers.Maps.Key_Type is _not_
indefinite. That's what I'd like to see.

> > Another point: Containers.Vectors.Size should return
> > Index_Type'Base, and the Size parameter in Resize should also be
> > Index_Type'Base. It's confusing to have different types for Size
> > and Index.
>
> No.  The parameter of the Resize operation specifies a hint about the
> future length of the container, which is subtype Natural.

Why is it Natural? Randy pointed out that Index_Type'Base might not
include 0, or even be an enumeral. I'd rather see Index_Type be
specified as a signed integer, including 0, rather than have Size
return a type that is not Index_Type. (SAL makes this choice).

> > There's also a problem if Natural'Last < Index_Type'Last; you
> > can't have a vector that contains every index!
>
> The assumption is that a container will always have fewer the
> Integer'Last number of elements.  (On a 32 bit machine that's 4.2
> billion values...)

And that assumption is precisely the problem. On systems where
Integer'Last is 2**15, you can't have large containers. Ada must not
make such assumptions!

****************************************************************

From: Stephen Leake
Sent: Monday, February 9, 2004  2:14 PM

> The internal node of the map reference implementation looks like this:

Ok. That makes sense. I suggest this level of detail be kept in the
Rationale for the Ada.Containers package.

I address this issue in SAL
(http://www.toadmail.com/~ada_wizard/ada/sal.html) by allowing the
user to specify both the Key_Type and the Key_Node_Type, and provide a
function To_Key_Node to go from one to the other. For definite keys,
the types are the same, and To_Key_Node is an inlined null function,
so there is no overhead. For indefinite keys, that function does the
allocation.

Hm. In shared code generics, I guess the "inlined null function" does
not get optimized away. So perhaps this would not be an appropriate
approach for a standard Ada package.

Actually, in SAL, keys are always stored in the Items, so you'll only
see Item_Type, Key_Type, and Item_Node_Type, not Key_Node_Type. But
the principle is the same.

It is more complex to instantiate SAL containers than the proposed
Ada.Containers.Map. But I would argue that it is worth it.

> If you want an indefinite key type, then allocate the key object
> yourself and instantiate the component using the key access type.
> This shouldn't be a problem since the map object is typically part of
> some higher-level abstraction anyway, so you can hide the allocation
> and map manipulation from the users of that higher-level
> abstraction.

Ok. In SAL, I don't have two layers. And I agree with others who say
that Ada should provide a useful container that does "typical" memory
management tasks for you.

But any container is better than none :).

****************************************************************

From: Alexandre E. Kopilovitch
Sent: Monday, February 9, 2004  3:05 PM

Pascal Obry wrote:

> Ada is not low-level and certainly not a system
> programming language. Ada is an high level language without a specific
> domain, this is my point of view.

Self-contradictory viewpoint, though - because high level language without a
specific domain and low-level system programming language are roughly the same
thing -:)

> The idea behind the Ada containers was to have a common set of useful
> components for Ada to avoid reinventing the wheel... So the argument
> "If you need to store elements whose type is indefinite, then you have to
> build that abstraction yourself" sounds boggus to me ;)

If we call them "containers" then they should, in some substantial sense,
*contain* things, not just refer to them, So, in this case, they should do
all associated memory management. Otherwise, they aren't Containers, they are
Inventories. It is improper name that confuses the matter and creates heated
argument.

Also, it seems that the library is planned without looking at new features
in Ada2005, particularly, interfaces. I think that this (if true) may be a
serious mistake. Interfaces may provide a way for reconciling different
requirements.

****************************************************************

From: Ehud Lamm
Sent: Tuesday, February 10, 2004  1:04 AM

I would be very happy to see an Ada.Container.Interfaces (or
Ada.Container.Signatures) package/hierarchy, specifying APIs, which could
then be used to achieve (static) polymorphism.
I think this is the palce to provide Stack, Queue interfaces etc. as well.
I think that's a good way to encourage the building block approach.
As far as I recall the workshop we had in Vienna (right?), not many shared
my enthusiasm, alas.

****************************************************************

From: Randy Brukardt
Sent: Tuesday, February 10, 2004  6:53 PM

Alexandre E. Kopilovitch wrote:

...
> Also, it seems that the library is planned without looking at new features
> in Ada2005, particularly, interfaces. I think that this (if true) may be a
> serious mistake. Interfaces may provide a way for reconciling different
> requirements.

I wondered how long it would be before someone asked that question.

I did in fact do some (idle) thinking on that question, and I concluded that
interfaces wouldn't be useful for the containers library.

What you'd like is to be able to write interfaces that describe iteration,
for example, and be able to use those without knowing anything about the
underlying container. Similarly, you could have a sequence interface that
worked with any sequence container.

However, that doesn't really work. The primary problem is that the profiles
of the operations of an interface are fixed other than the object itself.
But, for a container, the operations contain a generic formal type (the
element type), as well as the object type. That means that general
interfaces (like the ones described above), for example) can't be written
that would match any possible element type, only a specific element type
(which is pretty useless).

One way to get around that would be to put the interfaces into the generic
units. But then, the interfaces would only be usable with that container --
hardly a useful interface! You might as well just use the container
directly.

A better way would be to make the element type an interface itself. Then you
could write useful non-generic interfaces. But that would limit the
contained objects to types that can have an interface: tagged types, and
perhaps task and protected types (and of course have the required
interface). That sort of limitation isn't going to fly for the primary
container library - a container of access values is just too common and
important. (I could imagine an O-O offshoot that worked at that way - in a
secondary standard.)

****************************************************************

From: Alexandre E. Kopilovitch
Sent: Tuesday, February 10, 2004  9:45 PM

Randy Brukardt wrote:

> I did in fact do some (idle) thinking on that question, and I concluded that
> interfaces wouldn't be useful for the containers library.
>
> What you'd like is to be able to write interfaces that describe iteration,
> for example, and be able to use those without knowing anything about the
> underlying container. Similarly, you could have a sequence interface that
> worked with any sequence container.

Yes.

> However, that doesn't really work. The primary problem is that the profiles
> of the operations of an interface are fixed other than the object itself.
> But, for a container, the operations contain a generic formal type (the
> element type), as well as the object type. That means that general
> interfaces (like the ones described above), for example) can't be written
> that would match any possible element type, only a specific element type
> (which is pretty useful).

This shows an unpleasant incompatilibity of interfaces with generics. Well,
perhaps "incompatibility" is too strong word for that, but anyway there is
some inconsistence, these notions do not collaborate smoothly. And this is
a general issue, regardless of container library.

> One way to get around that would be to put the interfaces into the generic
> units. But then, the interfaces would only be usable with that container --
> hardly a useful interface! You might as well just use the container
> directly.

Yes, this is clearly a poor way.

> A better way would be to make the element type an interface itself. Then you
> could write useful non-generic interfaces. But that would limit the
> contained objects to types that can have an interface: tagged types, and
> perhaps task and protected types (and of course have the required
> interface). That sort of limitation isn't going to fly for the primary
> container library - a container of access values is just too common and
> important.

I don't understand the latter sentence - I thought that access to interfaces
is permitted... I'm looking at the last example in AI-251 (under the line
"A somewhat less artifical example") - there is type Object_Reference, which
is access to interface type Monitored_Object'Class, and this Object_Reference
is used for parameters of procedures Register and Unregister.

And if you meant that those access values may point to untagged types then
I think that "boxing" those untagged types will not significantly annoy a
programmer.

But anyway I don't think that this way is generally better. It artificially
pushes a containter in position of "controlling object", which isn't a good
thing. And it often convolutes thinking... seems no better than typical C++
puzzles, a maintainer's hell.

****************************************************************

From: Randy Brukardt
Sent: Wednesday, February 10, 2004 11:03 PM

> I don't understand the latter sentence - I thought that access to interfaces
> is permitted... I'm looking at the last example in AI-251 (under the line
> "A somewhat less artifical example") - there is type Object_Reference, which
> is access to interface type Monitored_Object'Class, and this Object_Reference
> is used for parameters of procedures Register and Unregister.

Yes, but access types themselves are not tagged. What they point at is
irrelevant. If you have a formal "type T is tagged private;" no access type
will match that; it's the same for interfaces.

You could of course wrap the access type in a tagged record, and give the
interface to that, and then the element type could be that. But then you have
an extra component name in every use, which is annoying.

For lower-level uses, having a vector/sequence of pointers or a map of pointers
certainly sounds useful and common; forcing wrapping is not going to win any
style points.

****************************************************************

From: Robert A. Duff
Sent: Monday, February 9, 2004  4:28 PM

Regarding support for indefinite keys,
Martin Krischik said:

> But you could not even strore a collection of strings. Ok, there are
> unbounded strings. But storing 'Class thats the killer feature. If
> Ada.Containers can't do it I am not interested. The will be no 20%/80%
> split. Its 0% - I won't us them.

How about this: you write a package that supports the indefinite case,
and you build it on top of the (currently proposed) standard package
that supports only definite?  The definite-only package takes care of
the hashing or whatever, and your package takes care of memory
management for the indefinite keys.

Maybe you try to get your package to be a de-facto standard, or a
secondary standard.

The point is, you *can* use the definite-only package, but only
indirectly, via a wrapper package.  The definite-only package isn't
useless; it does *part* of the job you desire.  This seems like a better
design than making a single package that supports both, and somehow
magically optimize the definite cases.

If the RM supports indefinite, I claim it should do so by providing two
separate packages.  But we're trying to minimize the size of all this,
so we choose just the lower-level one of those.

Yeah, it would be nice if the RM provided both...

****************************************************************

From: Randy Brukardt
Sent: Monday, February 9, 2004  5:36 PM

These seem like an ideal candidate for the hoped-for containers secondary
standard.

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  11:09 AM

Randy Brukardt wrote:

> If we want an array sort, we should declare one:
>
>     generic
>        type Index_Type is (<>);
>        type Element_Type is private;
>        function "<" (Left, Right : Element_Type) return Boolean is <>;
>        type Array_Type is array (Index_Type) of Element_Type;
>     procedure Ada.Generic_Sort (Arr : in out Array_Type);
>
> (We'd need an unconstrained version, too.) But keep it separate from the
> Vector one (or any List one, for that matter).

I added a the following library-level declarations to the latest
reference implementation:

AI302.Containers.Generic_Sort_Constrained_Array
AI302.Containers.Generic_Sort_Unconstrained_Array
AI302.Containers.Generic_Sort

The latter works for any sequence having a random-access iterator, um, I
mean cursor.

<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040209.zip>

They're all basically the same: a simple quicksort using a median-of-3
to choose a pivot.

The Generic_Sort for the vector is implemented as an instantiation of
the generic sort for arrays.

<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040209.zip>

****************************************************************

From: Robert A. Duff
Sent: Sunday, February 8, 2004  12:09 PM

Marius Amado Alves wrote:

> In the meanwhile, there is no requirement that Ada.Containers be
> implemented strictly in Ada, is there?

No.  However, there is "meta requirement" that Ada.Containers be
implementABLE in Ada, and I expect all implementations will be in plain
vanilla Ada without compiler-specific tricks.

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  1:41 PM

The proposal can be implemented in Ada today.  In fact it already is:

<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040209b.zip>

****************************************************************

From: Ehud Lamm
Sent: Tuesday, February 10, 2004 12:58 AM

I agree. I think the meta requirement is the way to go. If there is some good
reason to resort to non-Ada code, it should be allowed, so long as the API is
maintained. BUT, it would reflect badly on the languiage if the only way to
implement this sort of library efficiently would require going outside the
scope of the language. Remember Ada is a general purporse, reuse oriented
language.

One of the reasons I wanted this discussion (and I pushed for a standard
container library back when practically no one wanted to hear...) is that I
think that by working on standard libraries it is easier to focus on areas
where the language needs improvement.
I think this is in fact what's happening right now...

****************************************************************

From: Robert A. Duff
Sent: Monday, February 9, 2004  2:37 PM

Right, and my point was that I want to keep it that way.
I suggest the AI mention this "meta requirement" in its discussion.

Some folks have suggested some sort of compiler-specific "magic" going
on behind the scenes.  I don't want that.

>...  In fact it already is:
>
> <http://home.earthlink.net/~matthewjheaney/charles/ai302-20040209b.zip>

I thank you for your hard work on this.  I haven't had a chance to look
at it yet, though.  What sort of copyright does it have?  Can the
various implementers just take your code and use it as their
implementation of this AI?

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  2:46 PM

Yes.  That was the intent.

We can attach any copyright necessary to allow implementors or anyone
else to use it.

Will the GMGPL work?  I'm not an expert on these matters.

****************************************************************

From: Robert A. Duff
Sent: Monday, February 9, 2004  4:36 PM

I suspect the GMGPL would work, but I'm not an expert on these matters,
either.  I suggest you ask Robert Dewar.

****************************************************************

From: Pascal Leroy
Sent: Monday, February 9, 2004  10:41 AM

> I've just posted the report of the containers committee on
> Ada-Comment. The executive summary follows. You can read the
> whole report in the !appendix to AI-00302-3/01, which you can
> find at: http://www.ada-auth.org/cgi-bin/cvsweb.cgi/AIs/AI-20302.TXT
> or you can download the ZIP or tar files from:
> http://www.ada-auth.org/ais.html

Good job.  A few comments after a first perusal:

1 - Insisting on O(N log N) complexity for the sorting algorithm
excludes Shellsort.  This is misguided in my opinion, as Shellsort often
behaves better in practice that Quicksort (in particular, if the input
file is nearly in order).

2 - I would really like it if the definition of containers were written
without a particular implementation in mind.  It's OK to explain that a
Vector is logically an array, but _requiring_ that insertion at the
beginning should take time O(N) is nonsensical!  This is preventing
possibly better implementations.  I have also seen in a mail by Randy
that element access has to be in O(1) (somehow I can't find this in the
AI).  Again, I believe that this is overspecification.  A skip list
would be in my opinion a perfectly good implementation of a Vector, as
in most practical situations the difference between O(1) and O(Log N)
doesn't matter.  But the O(1) requirement precludes a skip list
implementation...

3 - Similarly, I don't understand why the definition of Maps insists on
a hash-based implementation.  I have no problem with the notion that
this generic takes a hash-function, as this can be generally useful
whatever the implementation strategy.  But I don't see why it's
necessary to insist on or expose the details of a hash-based
implementation.  For large maps, a tree-based implementation makes
probably more sense.  We should not prevent such an implementation.
Furthermore, the description seems to require a hash-based
implementation that tries to keep the collision lists reasonably short
(by increasing the number of buckets) and that can lead to very
expensive deallocation/reallocation.

4 - Like others, I don't like the type names ending in _Type (but I
realize that's a matter of taste).  More seriously, I don't like the
usage of the word Vector, as this word is already used by AI 296.  Since
it might make perfect sense to have a vector-302 of vectors-296 (e.g.
successive positions of a mobile) the terminology is only going to cause
confusion among users.  Of all the proposals that I have seen, Sequence
has my preference.  And I don't give a damn what the terminology is in
Java or C++.

****************************************************************

From: Robert Dewar
Sent: Monday, February 9, 2004  11:02 AM

> 1 - Insisting on O(N log N) complexity for the sorting algorithm
> excludes Shellsort.  This is misguided in my opinion, as Shellsort often
> behaves better in practice that Quicksort (in particular, if the input
> file is nearly in order).

Or what about linear sorts like address calculation :-)

> 2 - I would really like it if the definition of containers were written
> without a particular implementation in mind.  It's OK to explain that a
> Vector is logically an array, but _requiring_ that insertion at the
> beginning should take time O(N) is nonsensical!  This is preventing
> possibly better implementations.  I have also seen in a mail by Randy
> that element access has to be in O(1) (somehow I can't find this in the
> AI).  Again, I believe that this is overspecification.  A skip list
> would be in my opinion a perfectly good implementation of a Vector, as
> in most practical situations the difference between O(1) and O(Log N)
> doesn't matter.  But the O(1) requirement precludes a skip list
> implementation...

I agree this is over specified. Also, O(1) is a bit bogus given caches
anyway.

> 3 - Similarly, I don't understand why the definition of Maps insists on
> a hash-based implementation.  I have no problem with the notion that
> this generic takes a hash-function, as this can be generally useful
> whatever the implementation strategy.  But I don't see why it's
> necessary to insist on or expose the details of a hash-based
> implementation.  For large maps, a tree-based implementation makes
> probably more sense.  We should not prevent such an implementation.
> Furthermore, the description seems to require a hash-based
> implementation that tries to keep the collision lists reasonably short
> (by increasing the number of buckets) and that can lead to very
> expensive deallocation/reallocation.

I agree with Pascal here entirely

> 4 - Like others, I don't like the type names ending in _Type (but I
> realize that's a matter of taste).  More seriously, I don't like the
> usage of the word Vector, as this word is already used by AI 296.  Since
> it might make perfect sense to have a vector-302 of vectors-296 (e.g.
> successive positions of a mobile) the terminology is only going to cause
> confusion among users.  Of all the proposals that I have seen, Sequence
> has my preference.  And I don't give a damn what the terminology is in
> Java or C++.

I really think the _Type suffix should be avoided, with few exceptions
it is not at all RM style.

****************************************************************

From: Tucker Taft
Sent: Monday, February 9, 2004  12:37 PM

> 1 - Insisting on O(N log N) complexity for the sorting algorithm
> excludes Shellsort.  This is misguided in my opinion, as Shellsort often
> behaves better in practice that Quicksort (in particular, if the input
> file is nearly in order).

The complexity specifications are intended to set expectations,
without being overly prescriptive.  If there are no shared expectations,
then the containers can end up being frustrating to use.
As usual, the requirement associated with a complexity specification
is that as N => infinity, there is some upper bound on the ratio
between the actual time and the given formula.  We should also
make it clear whether this is for the average case, or the worst case.

> 2 - I would really like it if the definition of containers were written
> without a particular implementation in mind.  It's OK to explain that a
> Vector is logically an array, but _requiring_ that insertion at the
> beginning should take time O(N) is nonsensical!

Clearly we should say "no worse than O(N)".

> ...  This is preventing
> possibly better implementations.  I have also seen in a mail by Randy
> that element access has to be in O(1) (somehow I can't find this in the
> AI).  Again, I believe that this is overspecification.  A skip list
> would be in my opinion a perfectly good implementation of a Vector, as
> in most practical situations the difference between O(1) and O(Log N)
> doesn't matter.  But the O(1) requirement precludes a skip list
> implementation...

I am not an expert on skip lists, but it seems critical to appropriate use
that any element of a vector is "directly addressible".  Random access
is a fundamental part of the abstraction, and if that is not efficient,
it will be very hard to create applications that work reasonably across
implementations.  There needs to be some kind of bound on random
access.  If you believe O(Log N) is acceptable, we can consider that.
For a vector, I personally expect O(1), where the constant factor is *very*
small, and the per-component space overhead ratio is no worse than 100%,
even for byte-sized components.

> 3 - Similarly, I don't understand why the definition of Maps insists on
> a hash-based implementation.  I have no problem with the notion that
> this generic takes a hash-function, as this can be generally useful
> whatever the implementation strategy.  But I don't see why it's
> necessary to insist on or expose the details of a hash-based
> implementation.  For large maps, a tree-based implementation makes
> probably more sense.

Why?  I would have thought just the opposite.  A hashed map can provide
an average case of O(1), and there is nothing precluding using trees
for the few hash buckets that get big.

> ...  We should not prevent such an implementation.
> Furthermore, the description seems to require a hash-based
> implementation that tries to keep the collision lists reasonably short
> (by increasing the number of buckets) and that can lead to very
> expensive deallocation/reallocation.

I feel like you are arguing both sides of the coin here.  You are objecting
to the behavior while at the same time saying we shouldn't specify it.
If it is clear that this is an abstraction whose performance is no worse
than an extensible hash table, then it is more likely it will be used
appropriately.  By doubling on each expansion, the number of reallocations
can be kept relatively small, and the pieces left behind are generally
just the right size for other growing hash tables.

I suppose you could say that you will implement it in a way that makes
your particular customers happy, but I don't think that is a way to
create a standard.  The goal is portability, not only in terms
of correct execution, but also in terms of reasonable, relatively
predictable performance.

I agree we shouldn't overspecify, but nor should we underspecify.  We
need to specify enough to establish useful, reasonable expectations
for implementors and users, so the container library is not just a
toy, but is actually a useful part of the professional Ada programmer's
toolkit.  We certainly should never discourage implementors from
doing better than the minimal requirements, but nor should we encourage
them to deviate so much from the minimal requirements that they have
effectively created a different abstraction, interfering with
portability.

I see the error bounds specified for the elementary functions as a similar
exercise.  They establish expectations, which reduces confusion and
frustration, and helps make it clear when the language-defined functions
can be used appropriately, and when they can't.

> 4 - Like others, I don't like the type names ending in _Type (but I
> realize that's a matter of taste).  More seriously, I don't like the
> usage of the word Vector, as this word is already used by AI 296.  Since
> it might make perfect sense to have a vector-302 of vectors-296 (e.g.
> successive positions of a mobile) the terminology is only going to cause
> confusion among users.  Of all the proposals that I have seen, Sequence
> has my preference.  And I don't give a damn what the terminology is in
> Java or C++.

Of course you don't give a damn.  But the question is whether other users
who do write significant amounts of code in other languages will appreciate
the effort to be part of the mainstream, rather than always trying to
swim in our own creek, elegant and pure as it may be.

****************************************************************

From: Robert Dewar
Sent: Monday, February 9, 2004  1:10 PM

Tucker Taft wrote:

> The complexity specifications are intended to set expectations,
> without being overly prescriptive.  If there are no shared expectations,
> then the containers can end up being frustrating to use.
> As usual, the requirement associated with a complexity specification
> is that as N => infinity, there is some upper bound on the ratio
> between the actual time and the given formula.  We should also
> make it clear whether this is for the average case, or the worst case.

Big O is not an upper bound, it is a description of asymptotic behavior.
As written, this spec would prohibit a sort whose behavior was
asymptotically linear.

> I am not an expert on skip lists, but it seems critical to appropriate use
> that any element of a vector is "directly addressible".  Random access
> is a fundamental part of the abstraction, and if that is not efficient,
> it will be very hard to create applications that work reasonably across
> implementations.  There needs to be some kind of bound on random
> access.  If you believe O(Log N) is acceptable, we can consider that.
> For a vector, I personally expect O(1), where the constant factor is *very*
> small, and the per-component space overhead ratio is no worse than 100%,
> even for byte-sized components.

What does constant factor mean here? A typical implementation of arrays
will have extreme variable behavior depending on caching. A naive model
in which all access is constant time is unrealistic in any case.

> Why?  I would have thought just the opposite.  A hashed map can provide
> an average case of O(1), and there is nothing precluding using trees
> for the few hash buckets that get big.

I personally think that any comments about performance should be
implementation advice, not requirements. You will get into all kinds
of formal mess if you try to make them requirements, but as IA they
are fine and comprehensible.

> Of course you don't give a damn.  But the question is whether other users
> who do write significant amounts of code in other languages will appreciate
> the effort to be part of the mainstream, rather than always trying to
> swim in our own creek, elegant and pure as it may be.

To me, sequence *is* more mainstream than vector. The latter phrase
comes with far too much baggage :-)

****************************************************************

From: Stephane Barbey
Sent: Monday, February 9, 2004  1:54 PM

Both IDL and UML (OCL) use "Sequence" for unbounded collections
of ordered elements that allow the same element more than once.

OCL offers Set, Bag, Sequence and Collection.

The Ada mapping to IDL offers a Corba.Sequences.Unbounded
(and Bounded) package that are similar in spirit (and in
specification) to what the Ada.Strings.Bounded and
Unbounded packages provide.

****************************************************************

From: Randy Brukardt
Sent: Monday, February 9, 2004  2:07 PM

> I personally think that any comments about performance should be
> implementation advice, not requirements. You will get into all kinds
> of formal mess if you try to make them requirements, but as IA they
> are fine and comprehensible.

All of the performance "requirements" *are* written as Implementation
Advice. There isn't any way that I can think of to make them normative, and
in any case, that would be overspecification.

So, if Pascal wants to ignore them, he can -- he just has to document that
fact.

****************************************************************

From: Robert Dewar
Sent: Monday, February 9, 2004  3:23 PM

OK, sorry, missed this, then I have no objection to any of the
statements, though there is still a bit of over-specification
I would say :-)

****************************************************************

From: Robert A. Duff
Sent: Monday, February 9, 2004  4:27 PM

By the way, I find this discussion somewhat frustrating, because there
are discussions going on in ada-comment, and also on arg.  People are
raising some of the same points on both.  It seems like the ARG should
pay a lot of attention to real users on this issue, but I fear some key
ARG members are not currently listening to ada-comment, and many
ada-comment folks are not seeing the arg mailing list.

Sigh.

Anyway, Pascal Leroy said:

> 2 - I would really like it if the definition of containers were written
> without a particular implementation in mind.  It's OK to explain that a
> Vector is logically an array, but _requiring_ that insertion at the
> beginning should take time O(N) is nonsensical!

I'm responding to Pascal's message, because it makes the point so
clearly, but this is really a more general comment.

This is the *usual* view of language design, and the usual view in the
Ada RM -- we specify the high-level semantics, and not the efficiency of
things.

However, I think for a container library, efficiency properties are the
key issue.

Consider "sequences" -- an ordered sequence of items, which can in
principle be numbered from 1 to N (or 0 to N-1, if the programmer
prefers).  There are many possible implementations of "sequence" --
singly-linked lists, doubly-linked lists with dummy header, growable
arrays, fixed-size arrays, etc.  Programmers choose among those
primarily for efficiency reasons.

Therefore, I think we should be thinking about a secondary standard that
contains a variety of "sequence" packages.  Each should be named
according to the intended implementation, so the programmer can choose
wisely.  We're saying "vector" (meaning "array-based" or "contiguous
hunk of storage") should be the one in the next RM -- but we expect
others, like linked lists.

So I disagree with Pascal above -- I think the container packages
*should* have a particular implementation in mind.  I'll even go further
than Randy, and say that instead of "O(1) access" I really want "a
vector/array-based implementation".

Now, you may say that's overspecification.  Why shouldn't the
implementer choose a "better" implementation?  Well, for containers,
there is no "better" -- they just have different efficiency properties
(better for some uses, worse for others).  As a programmer, I need to
know the underlying implementation.

The language designer cannot know which implementation of sequences is
"better".  Nor can the implementer.  Only the programmer can know.
Therefore, we should not let implementers choose, here.

If one implementer chooses "arrays, deallocated and reallocated when
growing" and the other implementer chooses "skip lists", it's a disaster
-- the programmer has no idea which package to choose.

I say, the vectors package should say (as Implementation Advice) "the
intended implementation is as an array", rather than saying something
about O(1) access.  As others have pointed out, there's really no such
thing as O(1) random access -- if you make the vector big enough, you
will get O(log N) because of cache or paging effects.

Then a secondary standard can define 17 other varieties of "sequence"
that have different efficiency properties.  None is "best" for all
purposes.  However, it is desirable that they all have interfaces that
are as similar as possible.

SUMMARY: Don't let implementers choose the one be-all end-all sequence
package.  We choose a particular sequence implementation
(vectors/arrays) that is useful, and let a secondary standard build all
the others.  Let the programmer choose among them.

****************************************************************

From: Randy Brukardt
Sent: Monday, February 9, 2004  6:01 PM

Robert Duff:

> By the way, I find this discussion somewhat frustrating, because there
> are discussions going on in ada-comment, and also on arg.

Besides Bob's "real user" concerns, I am faced with the aggrevating task of
filing two unrelated threads on the same topic going on at the same time
into the same AI. I fear no one is going to be able to make sense out of the
!appendix section...

...
> So I disagree with Pascal above -- I think the container packages
> *should* have a particular implementation in mind.  I'll even go further
> than Randy, and say that instead of "O(1) access" I really want "a
> vector/array-based implementation".

That's actually what the Implementation Advice says. But of course it is
Implementation Advice, so it has no force: Pascal can use a skip list if he
wants.

Going further than that would be useless and bad for at least some
implementations.

For instance, because of generic code sharing, the implementation of the
Vector type will essentially be an array of pointers. Because of that, I'll
probably implement this as an array of pointers, and use that to eliminate
copying in insert/delete/sort operations. Technically, that would still be a
correct implementation (insert would still be O(N), just the constant would
be a lot lower). But clearly, the ratio of execution times between the
various operations would be quite different for this package than for the
"canonical" implementation.

To avoid that, you'd pretty much have to specify the body of the package.
But even that doesn't really help. Again, looking at Janus/Ada, you're going
to get (implicit) allocations of the elements. So, for a Vector of
elementary, the cost of an Insert operation could be 20 times more than for
a non-sharing implementation. (While for a Vector of a type with an
expensive assignment, it might only be a few percent more.) For most uses of
the container, this difference in performance (which appears because the
unit is generic) is likely to matter more than the O(N) performance.

Of course, this is an extreme example, but it shows that the actual
performance of the container is going to depend heavily on the
implementation no matter what is specified in the standard. So going beyond
O(N) type specifications for key operations doesn't help, and could be
actively harmful (by preventing innovative implementations).

****************************************************************

From: Randy Brukardt
Sent: Monday, February 9, 2004  5:30 PM

Pascal wrote:

...
> 2 - I would really like it if the definition of containers were written
> without a particular implementation in mind.  It's OK to explain that a
> Vector is logically an array, but _requiring_ that insertion at the
> beginning should take time O(N) is nonsensical!  This is preventing
> possibly better implementations.  I have also seen in a mail by Randy
> that element access has to be in O(1) (somehow I can't find this in the
> AI).

For the record, here's the wording from the AI. (I wrote this, Matt wanted
it, but didn't know how to express it. I'm not sure I do either - but I knew
I didn't want to define the O(N) notation...)

  Implementation Advice

  Containers.Vectors should be implemented similarly to an array. In particular,
  the time taken by Append and Element should not depend on the number of
  items in the Vector, and the time taken by Insert or Delete at First of the
  vector should take time roughly proportional to the number of elements in the
  vector.

And you are correct, the last part of the sentence should say "no worse
than" or something like that. (Although I can't think of any implementation
that meets the first part that doesn't also meet the second part exactly -
you can reduce the constant arbitrarily, but it still is proportional to N.)

> 4 - Like others, I don't like the type names ending in _Type (but I
> realize that's a matter of taste).

Our original idea was to avoid the "_Type". However, when I tried to do
that, there were a lot of conflicts with package, subprogram, and parameter
names. In the interests of the getting a report done on time, we wanted to
avoid major surgery to the proposal. (Especially updating the examples would
be painful.) So we stuck with "_Type".

If there is a majority opinion that it is worth going forward with these
packages, and that changing the names would be preferred, then I can spend
the time to do it. But I don't want to spend the ARG's limited resources
doing major changes if all we're going to do it kill the proposal anyway. (I
would hope that no one votes against the proposal solely because they don't
like the names - although such a result wouldn't surprise me.)

****************************************************************

From: Robert Dewar
Sent: Monday, February 9, 2004  6:01 PM

Randy Brukardt wrote:

> If there is a majority opinion that it is worth going forward with these
> packages, and that changing the names would be preferred, then I can spend
> the time to do it. But I don't want to spend the ARG's limited resources
> doing major changes if all we're going to do it kill the proposal anyway. (I
> would hope that no one votes against the proposal solely because they don't
> like the names - although such a result wouldn't surprise me.)

It's always risky to vote for something that is flawed with the
expectation of fixing it.

On the other hand, at least one delegation in Salem that was strongly
in favor of adding the keyword CLASS to the language voted against
JDI's proposal because they did not like the prefix notation (I told
Jean not to mix up the issues, but he did not listen to me). They
were quite dismayed that the proposal failed. So you never know...

(interestingly to wonder what would have happened at Salem if that
delegation had understood how the vote worked and voted their actual
interests, then the vote would have been 3-2 in favor of class X is ...
and the US was posed to follow the winning side, so the eventual vote
would have been 4-2 and who knows what would have happened?)

(sorry to digress, but it's an interesting little piece of Ada trivia
history :-)

****************************************************************

From: Robert Dewar
Sent: Monday, February 9, 2004  6:04 PM

Robert A Duff wrote:

> I'm responding to Pascal's message, because it makes the point so
> clearly, but this is really a more general comment.
>
 > lot's of sensible stuff deleted here
>
> SUMMARY: Don't let implementers choose the one be-all end-all sequence
> package.  We choose a particular sequence implementation
> (vectors/arrays) that is useful, and let a secondary standard build all
> the others.  Let the programmer choose among them.

I find Bob's comments here to make a lot of sense, and I agree with
all of them (yes I know that's a change in position, but I think the
fact that this is IA, and Bob's useful perspective make the difference).

****************************************************************

From: Pascal Leroy
Sent: Monday, February 9, 2004  4:58 AM

Bob chided me:

> By the way, I find this discussion somewhat frustrating,
> because there are discussions going on in ada-comment, and
> also on arg.  People are raising some of the same points on
> both.  It seems like the ARG should pay a lot of attention to
> real users on this issue, but I fear some key ARG members are
> not currently listening to ada-comment, and many ada-comment
> folks are not seeing the arg mailing list.

Sorry, the signal/noise ratio on Ada-Comment is too poor, I admit that I
don't have the patience to read all that stuff, and I didn't want to get
50 replies to my initial message.  Anyway, to avoid confusion, I promise
I will shut up until this topic is discussed face-to-face in Phoenix.

Randy pointed out:

> All of the performance "requirements" *are* written as
> Implementation Advice. There isn't any way that I can think
> of to make them normative, and in any case, that would be
> overspecification.

I realize that they are IA, and that's fine.  I am just arguing that the
advices as written are excluding perfectly good implementations.  Of
course I can ignore them, but that's not a satisfactory answer to me: if
we put them in the RM they should be useful.

Tuck commented:

> I agree we shouldn't overspecify, but nor should we
> underspecify.  We need to specify enough to establish useful,
> reasonable expectations for implementors and users, so the
> container library is not just a toy, but is actually a useful
> part of the professional Ada programmer's toolkit.

I completely agree with this principle.  The performance advices are
only there to prevent "bad" implementation.  They should not constrain
"good" implementations.  For instance, using a bubble sort is a no-no,
but we want an implementer to be able to use heapsort, quicksort or
shellsort (or a combination of the three).  Similarly, a Vector should
not be implemented using a simple linked list, but an array or a skip
list are both valid implementations.

> If you believe O(Log N) is acceptable, we can consider that.

As others have pointed out, O(1) and O(Log N) are hardly distinguishable
in practice, it's only the multiplicative factor that counts, so yes, I
believe that we should allow O(Log N) access for vectors.

Back to Bob:

> This is the *usual* view of language design, and the usual
> view in the Ada RM -- we specify the high-level semantics,
> and not the efficiency of things.
>
> However, I think for a container library, efficiency
> properties are the key issue.

I don't see what makes a container library so different from all the
rest.  Let me draw your attention to the fact that we don't specify
efficiency properties for the string packages, or for the numerics
(including the matrix operations of AI 296).  I know that Bob doesn't do
numerics, but for people who do, the performance of these libraries are
likely to be more critical than that of containers.  In practice what
happens is that they run benchmarks, and talk sternly to their vendor if
they don't like the results.

> Therefore, I think we should be thinking about a secondary
> standard that contains a variety of "sequence" packages.
> Each should be named according to the intended
> implementation, so the programmer can choose wisely.  We're
> saying "vector" (meaning "array-based" or "contiguous hunk of
> storage") should be the one in the next RM -- but we expect
> others, like linked lists.

You are on the right track to kill this proposal with kindness ;-)

> So I disagree with Pascal above -- I think the container packages
> *should* have a particular implementation in mind.  I'll even
> go further than Randy, and say that instead of "O(1) access"
> I really want "a vector/array-based implementation".

But what do you gain if you don't specify the multiplicative factor?  I
have this wonderful implementation of vectors, I swear it's O(1), but
for some reason the multiplicative constant is such that it takes 1 sec
on average to access an element.  This is a Duff-compliant
implementation, but hardly a good one.

Surely you don't want to get into the business of specifying the factor,
right?  Unless of course your target is a MIX computer ;-)

> Now, you may say that's overspecification.  Why shouldn't the
> implementer choose a "better" implementation?  Well, for
> containers, there is no "better" -- they just have different
> efficiency properties (better for some uses, worse for
> others).

The same is true for everything.  For the elementary functions, you have
a trade-off between speed and accuracy.  Which is best?  Depends on the
application.  For the random numbers, there is a trade-off between
speed, size of the generator, and quality of the random numbers.  Again,
there is no better implementation.

> If one implementer chooses "arrays, deallocated and
> reallocated when growing" and the other implementer chooses
> "skip lists", it's a disaster
> -- the programmer has no idea which package to choose.

Either the programmer doesn't care, for instance because they only put a
few elements in the vector, and both implementations are fine (that's
Randy's viewpoint, I think).  Or the programmer does care, and he better
run a simple benchmark with, say, a 10-million-element vector, and see
what happens.

> SUMMARY: Don't let implementers choose the one be-all end-all
> sequence package.  We choose a particular sequence implementation
> (vectors/arrays) that is useful, and let a secondary standard
> build all the others.  Let the programmer choose among them.

SUMMARY: For once, I disagree with just about everything that Bob wrote.

****************************************************************

From: Robert A. Duff
Sent: Tuesday, February 10, 2004  8:35 AM

> Bob chided me:

I didn't mean to chide you in particular.  In fact, I didn't mean to
chide anybody.  I was merely lamenting the fact that there is no forum
where the public (i.e. ada-comment folks) and the arg can discuss the
issue of containers.  Sorry.

> > However, I think for a container library, efficiency
> > properties are the key issue.
>
> I don't see what makes a container library so different from all the
> rest.  Let me draw your attention to the fact that we don't specify
> efficiency properties for the string packages, or for the numerics
> (including the matrix operations of AI 296).  I know that Bob doesn't do
> numerics, but for people who do, the performance of these libraries are
> likely to be more critical than that of containers.  In practice what
> happens is that they run benchmarks, and talk sternly to their vendor if
> they don't like the results.

It seems to me that for most features of the language, either efficiency
doesn't matter all that much, or else it's fairly obvious what the
efficiency properties will be.  I *know* (roughly) how compilers
represent integers, arrays, records, etc.  But there are many wildly
different ways to represent sequences.  I don't want Vectors represented
as skip lists any more than I want built-in arrays implemented as skip
lists.

There are a few cases like this is Ada already.  One example is
size-changing records (i.e. defaulted discriminants).  Some compilers
choose an allocate-the-max-size strategy, and others choose a heap-based
strategy.  The former is unacceptable when the max size is 2**33 bytes.
The latter is unacceptable in real-time systems that don't want heap
allocation, or whenever the extra level of indirection is too costly.
It's not obvious which implementation choice is "better".
If Ada 83 had specified (as a NOTE or whatever) which choice the
language designers expected, use of this feature would have been
much more portable.

As to numerics, I don't know what I'm talking about, but I know that the
numerics annex is full of accuracy requirements.  Isn't the
implementer's goal simply "as fast as possible, given the accuracy
requirements"?  Are there wildly different implementation strategies?
I was under the impression that it's more like, "spend more money,
make the algorithms incrementally faster".

As to matrices, I don't know what I'm talking about there, either,
but don't we want all vendors to use a two-dimensional array?
An implementer that chose a sparse representation wouldn't be
doing any favors, right?

...
> But what do you gain if you don't specify the multiplicative factor?  I
> have this wonderful implementation of vectors, I swear it's O(1), but
> for some reason the multiplicative constant is such that it takes 1 sec
> on average to access an element.  This is a Duff-compliant
> implementation, but hardly a good one.

I trust implementers not to *deliberately* sabottage their products.
But implementers need to understand what's expected of them.
I want to say that for Vectors, an array-based implementation is
expected -- we're not asking for the world's greatest all-purpose
sequence package here; we're asking for growable arrays.

> Surely you don't want to get into the business of specifying the factor,
> right?  Unless of course your target is a MIX computer ;-)

Agreed.

> SUMMARY: For once, I disagree with just about everything that Bob wrote.

Oh, well.  :-(

****************************************************************

From: Pascal Leroy
Sent: Tuesday, February 10, 2004  10:14 AM

> I don't want Vectors represented as skip lists
> any more than I want built-in arrays implemented as skip lists.

But why?  You have to explain why, you cannot just say "I don't want".

When I look at the specification of Vectors, the first implementation
that comes to my mind is to use an array if the vector is not too large,
and dense enough.  If it becomes too large I would probably want to
switch to a skip list implementation: this would avoid the unreasonable
O(N) cost on insertion/deletion.  Similarly if the vector becomes very
sparse (not many active elements), I would switch to a skip list
implementation to save space (and indexing would become a bit more
costly).

Of course the skip list would not store individual elements, but
probably chunks that have sufficiently high density.

Surely there are a number of parameters/threshold to be selected to do
the switch from array to skip list, but they should be easy to select by
graphing the space/time characteristics of each algorithm and looking
for the point where these characteristics intersect.

Incidentally we do this for string search: for small strings we use the
na‹ve algorithm, and when the string becomes large we switch to
Boyer-Moore.  You could call this overengineering, but as a user I don't
see why you would complain.

Now you've got to explain to me what is wrong with this approach.  Let
me say it again: this is the first thought that comes to my mind when I
read the specification of Vectors, so I'd like to be educated.

****************************************************************

From: Robert Dewar
Sent: Tuesday, February 10, 2004  10:23 AM

surely everyone would prefer a skip list if using a contiguous vector
would force page faults for every access to the vector.

I assume the real interest here is speed, not O(1) at the cost of
any constant

Anyway this is only IA :-)

****************************************************************

From: Robert A. Duff
Sent: Tuesday, February 10, 2004  11:11 AM

Pascal wrote:

> But why?  You have to explain why, you cannot just say "I don't want".

I've got nothing against skip lists.  What I really want is
uniformity of efficiency across implementations.  The only way I know
how to achieve that is for the programmer to choose among basic
implementation strategies.

> When I look at the specification of Vectors, the first implementation
> that comes to my mind is to use an array if the vector is not too large,
> and dense enough.  If it becomes too large I would probably want to
> switch to a skip list implementation: this would avoid the unreasonable
> O(N) cost on insertion/deletion.  Similarly if the vector becomes very
> sparse (not many active elements), I would switch to a skip list
> implementation to save space (and indexing would become a bit more
> costly).

OK, now you're talking about a hybrid strategy.  I don't see how the
implementation could know about sizes and densities at compile time, so
I assume what you mean is that the Vector implementation gathers
statistics at run time, and switches among different strategies based on
that information.

The overhead of gathering statistics and checking them at relevant times
is worth it in some cases, and not in others.  All I'm saying is that
only the programmer can make that choice.  In my current project, we use
growable arrays that are almost always quite small.  The above "fancy"
implementation would be inappropriate.

Now if you say the "fancy" implementation is a good one, fine, then the
RM should encourage *all* implementers to use it.  Then I, as a
programmer, can know that I don't want to use the language-defined
Vectors package.  In other cases, I can decide that the language-defined
package is appropriate.  But if I have no idea what the underlying
implementation is, I can *never* use the language-defined (except
perhaps in toy programs that don't care about efficiency, or portability
thereof).

> Of course the skip list would not store individual elements, but
> probably chunks that have sufficiently high density.
>
> Surely there are a number of parameters/threshold to be selected to do
> the switch from array to skip list, but they should be easy to select by
> graphing the space/time characteristics of each algorithm and looking
> for the point where these characteristics intersect.
>
> Incidentally we do this for string search: for small strings we use the
> na‹ve algorithm, and when the string becomes large we switch to
> Boyer-Moore.  You could call this overengineering, but as a user I don't
> see why you would complain.

Well, I suppose I wouldn't complain about that.

> Now you've got to explain to me what is wrong with this approach.  Let
> me say it again: this is the first thought that comes to my mind when I
> read the specification of Vectors, so I'd like to be educated.

There's nothing wrong with that approach (I assume we're talking about
the Vector case, not the string-search case).  But if you choose that
approach, and some other compiler-writer chooses a wildly different
approach, the programmer will be lost.

****************************************************************

From: Randy Brukardt
Sent: Monday, February  9, 2004  6:54 PM

Jeffrey Carter:

> Randy Brukardt wrote:
> >
> > Huh? You've said, in effect, that the performance isn't good enough
> > for applications where the performance doesn't matter. That's a
> > pretty goofy statement!
>
> Actually, you originally said something like that. You have said
>
> 1. That the vector component should only be used by applications where
> performance doesn't matter.
>
> 2. That the difference in performance between possible implementations
> of vector may be critical to applications that use it.
>
> If performance doesn't matter to these applications, then the
> restriction on implementations should be removed. However, I agree with
> you that even applications that are suitable for the use of standard
> components may find the performance difference between different
> implementations critical.

That's what I get for trying to argue a position that I don't believe in.

My position is that performance does not matter for these components.
Period.

However, that's a minority position, and I understand the other argument.

The trouble with including performance is that then you must have enough
container forms to handle the most common performance profiles - that means
at least 4 sequence containers (and probably more - at least bounded and
unbounded forms, and list and vector forms) and similarly at least 8
associative containers, and we simply don't have the manpower to properly
specify such a library.

But in any case, I'm obviously not good at arguing it the current position,
and I'm not going to try anymore.

---

That said, my opinion is that the only container worth having (with no
performance requirements) is a map. The Set isn't sufficiently different,
and no sequence container is worth the effort. And such a container probably
ought to hold indefinite elements. (Performance doesn't matter, remember.)

But that position is a minority position (of one!), and I'm not going to
argue that, either.

****************************************************************

From: Robert A. Duff
Sent: Monday, February 9, 2004  7:13 PM

> That's what I get for trying to argue a position that I don't believe in.

;-)

> My position is that performance does not matter for these components.
> Period.
>
> However, that's a minority position, and I understand the other argument.

I'm afraid that I take the opposite position: efficiency is the key
issue.  I'll take the liberty of reposting my response on arg here:

[Editor's note: This word-for-word repeat of 50+ lines is removed to keep these
comments manageable. You can find it about 600 lines back; look for "This is
the *usual* view of language design..." in a message from Bob on Monday
at 4:27 PM]

> The trouble with including performance is that then you must have enough
> container forms to handle the most common performance profiles - that means
> at least 4 sequence containers (and probably more - at least bounded and
> unbounded forms, and list and vector forms) and similarly at least 8
> associative containers, and we simply don't have the manpower to properly
> specify such a library.

I'm saying we should lead the way toward those 4 or 8, as opposed to
trying to be the last word on "sequences" or "mappings" or etc.

****************************************************************

From: Randy Brukardt
Sent: Monday, February  9, 2004  7:18 PM

Jeffrey Carter wrote:

> Regarding Size and Resize, you wrote:
>
> > That's no different than many of the attributes in Ada, which (if set),
> > always return the values that they were set to. But what the compiler does
> > with those values is (almost) completely implementation-defined.
>
> There is a difference between a compiler directive and an operation of a
> package. The latter must have well defined behavior that is not
> implementation defined.

That's a goofy statement. There are lots of package operations in Ada that
have implementation-defined behavior. Try any file Open or anything in
Ada.Command_Line, for instance.

> > Huh? Resize tells the container a reasonable size to use; what the container
> > does with that information is up to it. Size simply returns that information.
>
> What does Size return if Resize has not been called?

The implementation-defined initial size of the container.

Note that there is still quite a bit of overspecification in some of the
wording. I didn't have the time or energy to rewrite every second line of
Matt's proposal, and it wasn't clear that I had the support of the committee
to do so, either.

> If the intention is as you described, then the operations appear to be
> useless, and should be eliminated.

Why? Giving a container an idea of how many elements it will contain can be
a great efficiency help. But there shouldn't be any specification of what it
will mean.

> The introductory text to Vectors does not make it clear that this is an
> extensible array (EA).

Probably because no one uses such a term! The first time I can recall anyone
talking about extensible arrays in my 25+ years of programming (including my
college courses) was last week. I of course know what is meant because the
words have their conventional meanings, but I doubt that there are many
people out there looking up "extensible array" in an index!

...
> So, if the ARM gains a mathematical library of matrices and vectors,

It already did. See AI-296, already approved. (Note that this is an old Ada
83 standard that has not been widely used - but the fact remains that Ada
has had vectors in the mathematical sense for a long time.)

> However, this is really a general problem, and a general solution might
> be advisable. There are no predefined modular types in Standard, so we
> might want to add
>
> type Maximal_Count is mod implementation-defined;

Adding types to Standard is dangerous, because they hide ones visible via a
use-clause. We're not planning to add anything named to Standard for this
reason. Adding it to Ada could cause trouble if there is a use clause for
Ada in a program. So, I'd suggest such a type be added to Ada.Containers
(next to Hash_Type).

> I don't understand why the string-keyed maps exist, since they are
> equivalent to a map with an unbounded string key. The implementation
> would have to store the provided key in an appropriate unbounded string,
> or duplicate the functionality of unbounded strings.

No, a stringspace implementation would be much better than Unbounded_String
for storing large numbers of strings of unknown length. That's precisely the
idea of this component (and the reason it exists separately).
Unbounded_Strings require many tiny allocations, while a stringspace
implementation requires just one (or a few) larger ones.

...
> This discussion of the searchable structure and the map based on it
> seems to indicate a basic design problem with the hashed map component.
> A hash table is not trivial to implement correctly. There are uses for
> hash tables other than maps. As it stands, the user who wants a hash
> table must create one, duplicating the effort performed for the map, and
> increasing the likelihood of errors.

Huh? What could you do with a separate hash table that you couldn't do with
a map? The hash "buckets" contain *something*, and that something is (or can
be) the same as the map elements.

I suspect that if you try to develop this separate container, you'll end up
with pretty much the same interface as map - so there is no reason for a
separate version.

****************************************************************

From: Jeffrey Carter
Sent: Tuesday, February 10, 2004  12:52 PM

Randy Brukardt wrote:

> That's a goofy statement. There are lots of package operations in Ada
> that have implementation-defined behavior. Try any file Open or
> anything in Ada.Command_Line, for instance.

At least I'm consistent :) I agree. In retrospect, I worded that badly,
using general terms when referring to specifics.

>>> Huh? Resize tells the container a reasonable size to use; what the container
>>> does with that information is up to it. Size simply returns that information.
>
>> What does Size return if Resize has not been called?
>
> The implementation-defined initial size of the container.

OK. Let's see if I understand your position correctly. Resize gives the
implementation a hint about a reasonable size to use, but the
implementation may do whatever it wants, including nothing. Size returns
the actual size of something if Resize has not been called, but the last
size given to Resize if Resize has been called, regardless of what the
implementation does (or doesn't) do with that size.

So it appears that you are saying the implementation is required to keep
track of whether Resize has been called, and to store the size passed to
Resize. That doesn't seem like a very useful requirement to me.

It's fun to argue this kind of thing, but we're really wasting time. My
concern is not really what you think should be required, but what the
proposal actually requires.

> Giving a container an idea of how many elements it will contain can
> be a great efficiency help. But there shouldn't be any specification
> of what it will mean.

That's fine. But the specification of Resize requires that it perform an
allocation. That's primarily why these operations concern me.

Allowing the user to know the current size doesn't seem very useful to
me, but I don't see how it can hurt. Allowing the user to force a resize
does seem unwise.

Resize is an appropriate name for the operation as specified. I expect
an operation named Resize to cause resizing. If we're really talking
about giving the implementation a hint about an appropriate size, then
not only does the specification need to be changed, the name also needs
to be different (perhaps Size_Hint?).

> Note that there is still quite a bit of overspecification in some of
> the wording. I didn't have the time or energy to rewrite every second
> line of Matt's proposal, and it wasn't clear that I had the support
> of the committee to do so, either.

Right, and most of my comments were identifying such areas and
presenting alternative wording. I hope, as such, they are useful. I can
understand you not being able to correct all of these, but if they are
not corrected, the current proposal is unacceptable.

Normally, I would think the original author would be the best person to
make such changes. However, Heaney's response to suggestions that the
proposal could be improved has uniformly been that the proposal is
correct as it stands (although I see that after saying that the vector
doesn't need an iterator, he has now added an iterator to his reference
implementation). Perhaps he is more amenable to requests for
modifications from the select committee.

I do have time at the moment, and am willing to make the effort if that
is desired. The committee needs to ask, since I'm unwilling to waste the
effort.

> Probably because no one uses such a term! The first time I can recall
> anyone talking about extensible arrays in my 25+ years of programming
> (including my college courses) was last week. I of course know what
> is meant because the words have their conventional meanings, but I
> doubt that there are many people out there looking up "extensible
> array" in an index!

Surely I didn't invent the term! I agree with you, though. This is a
case where I'm familiar with the concept and have used versions of it
for decades, but I've never encountered a general name for it, except I
know it's not a vector. By analog to unbounded strings, perhaps
unbounded array is best.

> It already did. See AI-296, already approved. (Note that this is an
> old Ada 83 standard that has not been widely used - but the fact
> remains that Ada has had vectors in the mathematical sense for a long
> time.)

Good. I was not aware of this standard. However, this simply reinforces
my opposition to calling unbounded arrays "vectors".

> Adding types to Standard is dangerous, because they hide ones visible
> via a use-clause. We're not planning to add anything named to
> Standard for this reason. Adding it to Ada could cause trouble if
> there is a use clause for Ada in a program. So, I'd suggest such a
> type be added to Ada.Containers (next to Hash_Type).

OK. This should be useful for more than containers, so I'd like to see
it somewhere higher in the hierarchy, though the most important thing is
to avoid defining such types all over the place, like the Count types in
the IO packages. The odds of a conflict if it's in Ada are small, so I
wouldn't think that would be a problem. If the ARG/committee objects to
putting it in Ada, perhaps there should be a special child package for
such things.

> No, a stringspace implementation would be much better than
> Unbounded_String for storing large numbers of strings of unknown
> length. That's precisely the idea of this component (and the reason
> it exists separately). Unbounded_Strings require many tiny
> allocations, while a stringspace implementation requires just one (or
> a few) larger ones.

In general, a key is added to a map only once, and never modified. Using
Unbounded_String would, therefore, only need one allocation per key, so
I don't see that many tiny allocations are needed. However, you probably
know more about this sort of thing, since compilers need to do this kind
of thing a lot, so I may well be mistaken.

> Huh? What could you do with a separate hash table that you couldn't
> do with a map? The hash "buckets" contain *something*, and that
> something is (or can be) the same as the map elements.

Suppose I want to store Integers in a hash table so I can determine if
I've seen one before. There is no mapping from Integers to anything
else. Yes, I can do that with a map, by providing a dummy type for the
element type, and a dummy value for the element parameters, but that's
an ugly kludge. Defining a map in terms of a hash table is neither ugly
nor a kludge.

> I suspect that if you try to develop this separate container, you'll
> end up with pretty much the same interface as map - so there is no
> reason for a separate version.

A hash table doesn't have an operation to obtain an element given a key,
for example. I could agree that there's no reason for a separate map
given a hash table, since maps are trivial to implement with a hash
table, but ideally I'd like to see both. Ada can do better than
expecting the use of ugly kludges.

****************************************************************

From: Matthew Heaney
Sent: Tuesday, February 10, 2004  1:08 PM

Jeffrey Carter wrote:

> Normally, I would think the original author would be the best person to
> make such changes. However, Heaney's response to suggestions that the
> proposal could be improved has uniformly been that the proposal is
> correct as it stands (although I see that after saying that the vector
> doesn't need an iterator, he has now added an iterator to his reference
> implementation). Perhaps he is more amenable to requests for
> modifications from the select committee.

I said a vector doesn't need an *active* iterator.  My opinion on that
matter hasn't changed: active iterators (aka "cursors") are too
error-prone for (array-based) vectors.

I wasn't sure whether we needed *passive* iterators for a vector, since
Ada already provides a built-in for loop.  However, there has been
interest, and so *passive* iterators were added.

> Suppose I want to store Integers in a hash table so I can determine if
> I've seen one before.

Use a set, not a map.

The latest version of the reference implementation now supports the
stream attributes for containers.

<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040210.zip>

****************************************************************

From: Jeffrey Carter
Sent: Tuesday, February 10, 2004  6:35 PM

Matthew Heaney wrote:
> I said a vector doesn't need an *active* iterator.  My opinion on
> that matter hasn't changed: active iterators (aka "cursors") are too
>  error-prone for (array-based) vectors.
>
> I wasn't sure whether we needed *passive* iterators for a vector,
> since Ada already provides a built-in for loop.  However, there has
> been interest, and so *passive* iterators were added.

The actual things said were:

>> Vector should have an iterator, in addition to allowing the user to
>>  explicitly iterate over the structure.
>
> No.  Vector iterators are fragile, and hence very error prone.
>
> They are fragile because the (logical) internal array gets thrown
> away during expansion, which invalidates the iterator.  It's too hard
> to keep track of whether a vector iterator is still valid, and most
> of the time you end up with a dangling reference.

I was discussing the proposal in AI-302-03, so of course I used its
terminology. I did not mention cursors, nor did you.

You should also look "active" and "passive" up in a good dictionary.
Then perhaps you would discover what they mean, and realize that cursors
are passive and procedures are active.

Precision in terminology is important.

>> Suppose I want to store Integers in a hash table so I can determine
>> if I've seen one before.
>
> Use a set, not a map.

A typical answer: the proposal is perfect, therefore any problem a user
has with it must be with the user, not the library.

Yes, I want a set, but I want a hashed set, not one based on an O(log N)
search, perhaps because I know that with my hash function and expected
distribution of values, I can expect O(1) from a hash table.

****************************************************************

From: Matthew Heaney
Sent: Wednesday, February 11, 2004  9:33 AM

The terms "active iterator" and "passive iterator" are discussed in
section 7.3, Variations on a Theme: Iterators, in his book describing
the original Booch Components library:

Software Components With Ada
Grady Booch
Benjamin/Cummings Publishing Company 1987

p. 157-8: "Basically, there are two approaches to iteration, called
active and passive.  In the active approach, we expose the iterator as a
collection of primitive operations, but, in the passive approach, we
export only a single operation."

p. 158: "We shall first discuss the active iterator.  The iterator can
be considered an object of an abstract data type, characterized by the
following operations: Initialize, Get_Next, Value_Of, Is_Done."

p. 159: "With the passive iterator, rather than exporting the type
Iterator and its associated operations, we instead export a single
generic procedure that is nested in the specification of the queue
component."

The Iterator design pattern (aka "Cursor") is described in:

Design Patterns: Elements of Reusable Object-Oriented Software
Erich Gamma et al
Addison-Wesley Publishing Company 1995

p. 260: "Who controls the iteration? A fundamental issue is deciding
which party controls the iteration, the iterator or the client that uses
the iterator.  When the client controls the iteration, the iterator is
called an external iterator, and when the iterator controls it, the
iterator is an internal iterator. [footnote on p.260: Booch refers to
external and internal iterators as active and passive iterators,
respectively.  The terms "active" and "passive" describe the role of the
client, not the level of activity of the iterator.]  Clients that use an
external iterator must advance the traversal and request the next
element explicitly from the iterator.  In contrast, the client hands an
internal iterator an operation to perform, and the iterator applies that
operation to every element in the aggregate."

The footnote in Gamma was referring to the information in the section
Iteration in Chap. 9 (Frameworks) of:

Object-Oriented Analysis and Design with Applications, 2nd ed
Grady Booch
Benjamin/Cummings Publishing Company 1994

p. 356: "For each structure, we provide two forms of iteration.
Specifically, an active iterator requires that clients explicitly
advance the iterator; in one logical expression, a passive iterator
applies a client-supplied function, and so requires less collaboration
on the part of the client. [footnote on p. 356: Passive iterators
implement an "apply" function, an idiom commonly used in functional
programming languages.]"


Section 8.3.6 (Iterators) of the Ada95 Quality and Style Guide explains
the difference between active iterators and passive iterators as
follows:

"The terms active and passive are used to differentiate whether the
iteration mechanism (i.e., the way in which the complex data structure
is traversed) is exposed or hidden. A passive iterator hides the
traversal (e.g., looping mechanism) and consists of a single operation,
iterate, that is parameterized by the processing you do on each element
of the data structure. By contrast, an active iterator exposes the
primitive operations by which you traverse the data structure (Booch
1987)."

<http://www.adaic.com/docs/95style/html/sec_8/8-3-6.html>


My article at adapower.com, "Iterator and Factory Method Patterns
Combined," describes the difference between an active and passive
iterator as follows:

"There are two kinds of iterators: passive and active. A passive iterator
controls the actual movement within the data structure, and all a client
has to do is supply a procedure to receive each item in turn.

"An active iterator moves the responsibility for movement onto the
client. Unlike a passive iterator, which is essentially just a generic
subprogram, an active iterator is an actual type, with primitive
operations for retrieving the current item and for moving to the next
item in the sequence."

<http://www.adapower.com/alg/activeiter.html>


The "Algorithms and Data Structures I" (CS 131) course at the Dept of
Computer Science of The George Washington University has this to say
about the distinction between passive and active iterators:

"The linked-list package introduced in Section 8.2 provides an operation
called Traverse, which moves through the list, from beginning to end,
one element at a time, until each element has been "visited" exactly once.

"Formally, this Traverse operation is an example of a passive iterator
operation. An iterator is any operation that iterates through a data
structure one element at a time; we call it passive because the client
program simply calls it once and "stands back" passively while the
iterator roams through the entire structure. In this note, we use the
terms traversal and iteration interchangeably.

"Sometimes an application requires iterating through a structure,
touching each element once, but allowing the client program the
flexibility to decide just when to proceed to the next element. Moving
through a structure in this fashion is called active iteration, because
the client program is actively involved in the process at every step.

Active Iterator Operations: To be actively involved in the iteration,
the client program must execute a loop. We know that any loop must
contain statements for loop initialization, termination, and
incrementation; to support active iteration, the data structure package
must provide these operations, and also one for retrieval of the current
element in the traversal."

<http://www.seas.gwu.edu/~csci131/fall01/active-traversals.html>


The "Advanced Object-Oriented Design & Programming" (CS 635) at San
Diego State University says this about passive iterators:

"Neither Java nor C++ support passive iterators. Smalltalk does support
them. In a passive iterator, you pass a method or function to the
composite object, and the object then applies the method to all elements
in the object."

<http://www.eli.sdsu.edu/courses/spring01/cs635/notes/object/object.html>

In the topic "Generic Programming: Iterators" in the CS 412/512 course
at Old Dominion University, section 1.1 defines passive and active
iterators this way:

"Iterators can be:

o passive: we pass a function to the iterator and tell it to apply the
function to each item in the collection

o active: we ask the iterator to give us items, and each time it does,
we apply the desired function to it."

<http://www.cs.odu.edu/~zeil/cs412/Lectures/09iterators/iterators_summary.pdf>


In his description of the Bedrock framework for Macintosh apps, Scott
L. Taylor describes the iterators of the C++ Booch Components as
follows:

"Each structure comes with its own form of an iterator that allows
traversal of items within a structure. Two types of iterators are
provided for each structure, passive and active. Passive iterators
require much less interaction on the part of the client. A passive
iterator is instantiated and used by calling the iterator's apply()
method with a function pointer to the function to apply to all the
elements within the structure. Active iterators allow much more
flexibility but require more interaction from the client. Active
iterators must be told to go on to the next item, and the iterator
object returns a reference to each item in the structure for the client
to process or use. Active iterators are very similar to MacApp style
iterators."

<http://www.mactech.com/articles/frameworks/7_6/Booch_Components_Taylor.html>


The iterators of the container classes in the ET++ framework are
described like this:

"There are two types of iterators - passive and active iterators. The
latter provide methods for iterating to be called directly by the client
while with passive iterators the client provides a method to be called
on each element in the container."

<http://swt.cs.tu-berlin.de/~ron/diplom/node58.html>


In "An Overview of the Booch Components for Ada95," the iterators are
described this way:

"There are two forms: active and passive. Active iteration requires the
client explicitly advance the iterator. For passive, the client supplies
a single function "Apply" to work across the structure."

<http://www.rivatech.com/booch/documentation.html>
<http://www.pogner.demon.co.uk/components/bc/documentation.html>

> Precision in terminology is important.

Indeed, and my use of the terms "active iterator" and "passive iterator"
is consistent with the references cited above.

> Yes, I want a set, but I want a hashed set, not one based on an O(log N)
> search, perhaps because I know that with my hash function and expected
> distribution of values, I can expect O(1) from a hash table.

My original proposal had hashed sets.  (It also had sorted maps.)
However, in order to reduce the scope of the change to the language
standard the size of the proposal was reduced, and hashed sets didn't
make the cut.  No one got every container they wanted, not even me.

If you need a hashed set right now, then just grab the hash table from
the reference implementation and assemble it yourself.

<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040211.zip>

****************************************************************

From: Jeffrey Carter
Sent: Wednesday, February 11, 2004 12:56 PM

Matthew Heaney wrote:

> The terms "active iterator" and "passive iterator" are discussed in
> section 7.3, Variations on a Theme: Iterators, in his book describing
> the original Booch Components library:
>
> Software Components With Ada
> Grady Booch
> Benjamin/Cummings Publishing Company 1987

I'm familiar with Booch and the many errors he made in this book. I'm
also aware that many others are unable to think for themselves and have
slavishly followed his lead. I see that you have not looked up "active"
and "passive" and thought about what the phrase "active iterator"
actually means in English. You have simply quoted the errors of others.
Argument by authority is always suspect.

We now have a situation where the terms are actively confusing, and no
one who wants to communicate effectively uses them.

****************************************************************

From: Matthew Heaney
Sent: Wednesday, February 11, 2004  1:41 PM

I don't know what you mean by "glory,"' Alice said.

Humpty Dumpty smiled contemptuously. `Of course you don't -- till I tell
you. I meant "there's a nice knock-down argument for you!"'

`But "glory" doesn't mean "a nice knock-down argument,"' Alice objected.

`When _I_ use a word,' Humpty Dumpty said in rather a scornful tone, `it
means just what I choose it to mean -- neither more nor less.'

`The question is,' said Alice, `whether you CAN make words mean so many
different things.'

`The question is,' said Humpty Dumpty, `which is to be master - - that's
all.'

****************************************************************

From: Marius Amado Alves
Sent: Wednesday, February 11, 2004  2:07 PM

> We now have a situation where the terms are actively confusing, and no
> one who wants to communicate effectively uses them.

Please let's define then:
   - active iterator: use of a Cursor_Type object
   - passive iterator: use of a generic iteration procedure
I hope that's right...

/*
Personally I don't find the active/passive metaphor the most appropriate.
Manual/automatic would be more fitting. Active/passive for me is more
suggestive of program/data and read-and-write/read-only.

Also a confusion was that before alternative 3 "iterator" meant two different
things, namely the cursor and the abstract procedure (use of...). But now it
only means the latter.
*/

****************************************************************

From: Jeffrey Carter
Sent: Wednesday, February 11, 2004  5:46 PM

Marius Amado Alves wrote:
>
> Please let's define then:
>    - active iterator: use of a Cursor_Type object
>    - passive iterator: use of a generic iteration procedure
> I hope that's right...

No, the proposal has this right:

Cursor : a value that indicates a specific element in a container.
Iterator: a procedure that applies an action to each element in a
container in turn.

****************************************************************

From: Matthew Heaney
Sent: Thursday, February 12, 2004  9:38 AM

>    - active iterator: use of a Cursor_Type object

Yes.

>    - passive iterator: use of a Generic_Iteration procedure

Yes.

> I hope that's right...

Yes, that's correct.

> Also a confusion was that before alternative 3 "iterator" meant two different
> things, namely the cursor and the abstract procedure (use of...). But now it
> only means the latter.

An iterator is a mechanism for visiting elements in a container.  There
are two kinds of iterators: "active" iterators and "passive" iterators.

****************************************************************

From: Stephen Leake
Sent: Tuesday, February 10, 2004  2:45 PM

Jeffrey Carter <jrcarter@acm.org> writes:

> Allowing the user to know the current size doesn't seem very useful to
> me, but I don't see how it can hurt. Allowing the user to force a resize
> does seem unwise.

The user could run her application for a while, then query the current
size of the map and store it in a config file. Then, when the
application starts the next time, it reads the required size from the
config file, and calls Map.Resize. The intent is that this allows the
application to avoid all the resizes on the second run.

****************************************************************

From: Matthew Heaney
Sent: Tuesday, February 10, 2004  3:00 PM

That's one (clever) application of Resize.  The intent is that if you
know a priori what the ultimate number of elements will be, then this
avoids any expansion during insertion.  Insertion behavior is thus more
uniform.

See the examples in ai302/hash and ai302/hash2 in the reference
implementation for more ideas.

<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040210.zip>

****************************************************************

From: Jeffrey Carter
Sent: Tuesday, February 10, 2004  6:40 PM

I think I was talking about vectors.

Length is sufficient for this. The main problem is that the
specification prohibits some implementations: Resize is specified as
requiring an allocation, which may not be appropriate for some
implementations. Size_Hint, with no requirement what the implementation
does with the value, is more appropriate.

****************************************************************

From: Randy Brukardt
Sent: Wednesday, February 11, 2004 10:37 PM

Jeffrey Carter wrote:

(Sorry, I missed this yesterday.)

...
> OK. Let's see if I understand your position correctly. Resize gives the
> implementation a hint about a reasonable size to use, but the
> implementation may do whatever it wants, including nothing. Size returns
> the actual size of something if Resize has not been called, but the last
> size given to Resize if Resize has been called, regardless of what the
> implementation does (or doesn't) do with that size.
>
> So it appears that you are saying the implementation is required to keep
> track of whether Resize has been called, and to store the size passed to
> Resize. That doesn't seem like a very useful requirement to me.

Yup. That's precisely how Type'Size works in Ada; it has a fairly weak
effect on Obj'Size, but in any case, if you set it, you have to return the
same value (even if that value has nothing to do with how objects are
actually stored).

...
> Resize is an appropriate name for the operation as specified. I expect
> an operation named Resize to cause resizing. If we're really talking
> about giving the implementation a hint about an appropriate size, then
> not only does the specification need to be changed, the name also needs
> to be different (perhaps Size_Hint?).

I don't see a strong need to change the name, but I do agree with you that
there shouldn't be a *requirement* to do some allocation.

...
> > No, a stringspace implementation would be much better than
> > Unbounded_String for storing large numbers of strings of unknown
> > length. That's precisely the idea of this component (and the reason
> > it exists separately). Unbounded_Strings require many tiny
> > allocations, while a stringspace implementation requires just one (or
> > a few) larger ones.
>
> In general, a key is added to a map only once, and never modified. Using
> Unbounded_String would, therefore, only need one allocation per key, so
> I don't see that many tiny allocations are needed. However, you probably
> know more about this sort of thing, since compilers need to do this kind
> of thing a lot, so I may well be mistaken.

One allocation per key is a lot more than one allocation per *map*, which is
what a stringspace implementation takes. (Well, it might have to expand if
it gets full, but that should be rare. It could degrade to one allocation
per key if the keys are very, very long, but some care in implementation
should prevent degrading.)

> > Huh? What could you do with a separate hash table that you couldn't
> > do with a map? The hash "buckets" contain *something*, and that
> > something is (or can be) the same as the map elements.
>
> Suppose I want to store Integers in a hash table so I can determine if
> I've seen one before. There is no mapping from Integers to anything
> else. Yes, I can do that with a map, by providing a dummy type for the
> element type, and a dummy value for the element parameters, but that's
> an ugly kludge. Defining a map in terms of a hash table is neither ugly
> nor a kludge.

I have a component like that (it's actually Tom Moran's), but in practice,
I've *never* used it without using the index values it provides to manage
some other data in a separate table (at least statistics and/or debugging).
Even the 'known words' list in the spam filter uses the indexes (handles)
for debugging. If that's the case, why bother having to use a separate
component (causing another chance of error)?

So I would guess that the "dummy type" would gain some real data in 95% of
the applications. And that such uses are less than 10% of the uses of a map
anyway. Since this is a minimal library, we're not trying to cover that
remaining 0.5%.

****************************************************************

From: Randy Brukardt
Sent: Monday, February  9, 2004  7:41 PM

Matt Heaney said:

...
> As I mentioned in my previous message, Resize specifies a hint about the
> future number of elements in --that is, the length of-- the container.
> My assumption is that no container will ever have more than Integer'Last
> number of elements.

Ada only requires that Integer'Last is 2**15-1. That's 32767. Do you want to
assume that no container every has more than 32767 elements??

> If that assumption is incorrect, then maybe the container can be allowed
> to grow internally to more than Integer'Last number of elements, but can
> only report a maximum value of Integer'Last.
>
> Subtype Natural is the correct choice for the vector Resize operation.
>
> I think the ARG wants to use Hash_Type for Resize for the maps.  My
> reference implementation still uses Natural.

Wow! I've been promoted to be the entire ARG! :-)

No, I think we should use a purpose-built type for this, just like we did
for hashing (and for the same reasons). I hope we don't repeat the mistake
of Ada.Strings.Unbounded (which, at least has a justification for making
that mistake).

****************************************************************

From: Matthew Heaney
Sent: Tuesday, February 10, 2004  9:19 AM

> Ada only requires that Integer'Last is 2**15-1. That's 32767. Do you want to
> assume that no container every has more than 32767 elements??

I assumed that type Integer corresponded to the "natural" word size of
the machine, and that if Integer were only 16 bits that this portended
other, more invasive resource issues, which precluded very large numbers
of container elements.

But it just goes to show you I don't know very much...

> Wow! I've been promoted to be the entire ARG! :-)

Sorry about that, I should have said "ARG select committee on
containers" but laziness got the better of me.  I'll try to more clear
in the future.

> No, I think we should use a purpose-built type for this, just like we did
> for hashing (and for the same reasons). I hope we don't repeat the mistake
> of Ada.Strings.Unbounded (which, at least has a justification for making
> that mistake).

OK.  But it would be nice if the operators of the length/count/size type
were directly visible at the point where the container instance is
declared, without having to with Ada.Containers too.

****************************************************************

From: Randy Brukardt
Sent: Wednesday, February 11, 2004  9:53 PM

Matt Heaney wrote:

> Randy Brukardt wrote:
>
> > Ada only requires that Integer'Last is 2**15-1. That's 32767. Do you want to
> > assume that no container every has more than 32767 elements??
>
> I assumed that type Integer corresponded to the "natural" word size of
> the machine, and that if Integer were only 16 bits that this portended
> other, more invasive resource issues, which precluded very large numbers
> of container elements.

Never assume about the Standard. :-)

Janus/Ada made the choice of leaving Integer at 16-bits to ease porting of our
many 16-bit customers to our 32-bit compilers. That probably was a bad choice
(because it harms portability of other Ada code to Janus/Ada), but in any case
we're pretty much stuck with it. (Changing would break too much existing code
and especially files.)

3.5.4(21) is the only requirement on the range of Integer; there isn't anything
else, not even Implementation Advice, about going further. If you want
something specific, declare your own.

> > Wow! I've been promoted to be the entire ARG! :-)
>
> Sorry about that, I should have said "ARG select committee on
> containers" but laziness got the better of me.  I'll try to more clear
> in the future.

No, this idea was one that I idly mentioned (and dismissed) a couple of days
ago. I'm pretty sure no one else has talked about it (in either direction).

****************************************************************

From: Jeff Cousins
Sent: Tuesday, February 10, 2004  9:45 AM

Given that the Booch components are now available for free from AdaPower, is
there a pressing need for other containers?

Though having said that, we paid for the Booch components but only found
list_single_bounded_managed, list_utilities_single, heap_sort and quick_sort
to be of much use.

****************************************************************

From: Ehud Lamm
Sent: Tuesday, February 10, 2004 12:44 AM

> If you want to store elements of type T'Class, that you have to use an
> access type to instantiate the component, and then do the memory
> management of elements yourself.
>
> This is how it should be.

I agree with Matt on this one. Especially as regard 'class.
However, I think that strings should be treated as a speciall case. It seems
to me that the easiest approach is to provide a special version of the
packages for this case (a wrapper), which accepts string parameters (and
return type from functions), and uses unbounded internally. This wrapper can
be implemented on top of the basic library (instantiate with unbounded, and
let the wrapper routines simply do the string<->unbounded string
conversions).

One of the good things about having a standard library is that the
restricted component I described and others like it are going to be easy to
create, and share, seeing as they are based on packages all Ada users are
likely to have available. It is not mandatory they themselves be part of the
standard (though in this case I think it would be a valuable addition).

****************************************************************

From: Ehud Lamm
Sent: Tuesday, February 10, 2004 12:52 AM

> The most important point in a container library is *completeness* I would
> say. This is exactly what STL has done.

This is a good point, and keep in mind that I firmly belong to the 80/20 camp.
The reason why this point is well taken is that noone is likely to want to
use 2 (or 3) different contianer libraries inside one application. So the
feature rich library is likely to win over restricted (even standard) ones.
At least when building the _second_ application using such a library...

Howver, I don't think this means adding more stuff to Ada.Containers at this
point. Let's be practical here. Time is short etc. etc.

What should be done, however, is for the community to provide more
components based on the same style (and based on the simple building blocks
that are part of the stadnard lib). Some of these will be adopted into the
core later on, and some will simply coexist nicely with the standard lib
while remaining independent.

****************************************************************

From: Martin Krischik
Sent: Tuesday, February 10, 2004  2:07 PM

Am Montag, 9. Februar 2004 23:28 schrieb Robert A Duff:
> Regarding support for indefinite keys,
>
> Martin Krischik said:
> > But you could not even strore a collection of strings. Ok, there are
> > unbounded strings. But storing 'Class thats the killer feature. If
> > Ada.Containers can't do it I am not interested. The will be no 20%/80%
> > split. Its 0% - I won't us them.
>
> How about this: you write a package that supports the indefinite case,
> and you build it on top of the (currently proposed) standard package
> that supports only definite?

Did that allready - but it based on the booch components.

> The point is, you *can* use the definite-only package, but only
> indirectly, via a wrapper package.  The definite-only package isn't
> useless; it does *part* of the job you desire.  This seems like a better
> design than making a single package that supports both, and somehow
> magically optimize the definite cases.

Agreed, two packages are betten the one. And currently I do the same with the
booch componentes - only I create one from the other with the help of an text
filter instead of using a wrapper.

> If the RM supports indefinite, I claim it should do so by providing two
> separate packages.  But we're trying to minimize the size of all this,
> so we choose just the lower-level one of those.

Maybe the RM should suggest names for extended containers.

****************************************************************

From: Martin Krischik
Sent: Tuesday, February 10, 2004  2:16 PM

Am Montag, 9. Februar 2004 19:52 schrieb Matthew Heaney:

> The library is designed around the common case, which means definite key
> and element types.
>
> If you want to store elements of type T'Class, that you have to use an
> access type to instantiate the component, and then do the memory
> management of elements yourself.
>
> This is how it should be.

If a garbage collector was provided as well: Yes. Otherwise NO!!

There is something wich upsets me great time:

Half the Ada community says: No garbage collector please! - The container
library should do memory managment.

The other half says: No, container libraries should not provide memory
managment.

It would be better for Ada if the Ada comunity make there mind up.

Well since in AdaCL I have both I made my mind up: container libraries with
memory management is more usefull.

****************************************************************

From: Marius Amado Alves
Sent: Tuesday, February 10, 2004  2:55 PM

I just wrote the thing excerpted below.
The whole is available at
  http://www.liacc.up.pt/~maa/containers
Thanks.

-- TRUC : TRUE CONTAINERS
-- by Marius Amado Alves
--
-- Truc is a proof-of-concept implementation of AI-302/3
-- for indefinite elements, i.e. indefinite generic formal
-- element types (reorder the 4 adjectives at will).
--
-- Truc automatically chooses the appropriate implementation
-- for the actual type. Definite actuals select a Charles-like
-- body, whereas indefinite ones select a SCOPE-like one.
--
-- Truc is 100% written in Ada. Some optimizations could be
-- done by going a bit outside the language. This is
-- discussed elsewhere.
--
-- Only the vector variety is implemented.
-- Only a subset of the interface is implemented.

****************************************************************

From: Randy Brukardt
Sent: Tuesday, February 10, 2004  6:25 PM

I'm going back and filing all of these messages about this AI, and I'm
continually seeing statements like:

"If the containers don't have <my pet feature>, I'm not going to use them."

I realize hyperbole as common on mailing lists, but you have to keep in mind
the current situation.

In order to meet the schedule, the ARG needs to complete proposals by the
end of the June meeting, or they're not going to be in the standard. That
reality means that there is not time to develop a significantly different
proposal. (Wordsmithing is different; I expect there to be plenty of
wordsmithing done on this proposal. I certainly hope that some of the
problems noted by Jeff Carter (for instance) are fixed.)

The strategy proposed by the committee was to standardize something like
AI-302-3, and encourage the development of a secondary standard (at a more
leisurely pace!) to handle creating additional containers to provide
additional functionality, both performance related (bounded forms, lists,
etc.), functional (sorted_maps, unsorted_sets, etc.), and operational
(indefinite keys, indefinite elements, limited elements). We hope that
providing a standard root will channel future developments in a common
direction, rather than the scattershot approach that's currently prevalent.

The ARG is going to have to decide either to follow that strategy, or
essentially give up (because there is no time to develop an alternative).

Now, when you say "I won't use it.", you're putting the ARG members into a
spot:

1) Either the ARG has to standardize over the objections of users, "because
we know better"; or

2) Decide that there is insufficient consensus, and forget the proposal.

My feeling about the brief discussion at the San Diego meeting is that some
ARG members view this as an insolvable problem, and would just as soon
forget it (tossing it to some undefined International Workshop Agreement
process). It took a lot of persuading by Tucker (and to a lesser extent, by
me and couple of others) to set up the committee rather than just tossing it
at that meeting.

I fully expect to revisit that at our next meeting. If the discussion here
gives the opponents too much ammunition, there probably won't be a standard
container library in Ada now (and I personally think *ever*).

If that is your true opinion, feel free to express it - I'd rather spend my
time working on something that will likely be in the standard in that case!
But otherwise, I'd suggest cutting down the hyperbole in your messages.

****************************************************************

From: Marius Anado Alves
Sent: Wednesday, February 11, 2004  4:03 AM

They're not hyperboles. Please don't paternize. The wanted features
missing in the proposal had been expressed since long ago, even
formally, and repeatedly till now, and probably people saw this
discussion as a last change to win the "resistance".

It is clear now we must abandon all hope. Thanks for making that clear
at last.

So it's an incomplete library or none at all. I only fear an incomplete
standard can do more harm than good, principally in respect to
attracting new programmers to the language--by creating a bad first
impression. (For what it's worth, I'd say toss it. Those Front and Back
things looked terrible anyway.)

****************************************************************

From: Pascal Obry
Sent: Wednesday, February 11, 2004  4:25 AM

 > So it's an incomplete library or none at all. I only fear an incomplete
 > standard can do more harm than good, principally in respect to
 > attracting new programmers to the language--by creating a bad first
 > impression. (For what it's worth, I'd say toss it. Those Front and Back
 > things looked terrible anyway.)

I tend to agree with Marius here. Especially if the next change is for 5 or
10 years from now ! A good programming language needs to provides a decent
container library today. As I have already said, if the library is not broad
enough it will just not be used, Charles, PragmArc or the Booch components
will be used instead... In this case it is not even necessary to add a set
of standard containers...

Just my 2 cents of course.

****************************************************************

From: Marc A. Criley
Sent: Wednesday, February 11, 2004  7:57 AM

It looks like the frequent situation of no lack of devil's advocates (who
are only trying to make things better), and too few championing angels :-)

The Ada software I develop can be split into two broad categories:
performance critical, and non performance critical.

When writing the latter, NIH (Not-Invented-Here) is a dirty word to me.  I
want to write software fast and right, and I'll happily reuse standard
components, my own stuff that I've got laying around, and whatever utilities
and libraries have been posted on the Internet for free use.

For example, while Unbounded_Strings gets a lot of abuse in Ada discussions,
it and standard strings have pretty much provided all of the string
processing I've ever needed.

I fully expect as well that I'll be extensively employing Ada.Containers
just as soon as they're standardized.

I don't care if there are some purported conceptual weaknesses or omissions,
or if the implementation could be improved--if it provides the functionality
I need, is effectively bug-free, performs "good enough", and better yet is
part of the Ada standard, it gets used.

I don't want to write infrastructure if there's already packages that
provide it.  I don't want to have to select among the pros and cons of six
different home-grown container package collections and then have to concern
myself with whether the developer is going to maintain them, or if I have to
take on the responsibility for that as well.  (I've never concerned myself
with who's maintaing the Ada.Strings hierarchy, but I do now have to
maintain my own version of a particular container collection.)

I want Ada.Containers, and I will use them.  Make them as good and powerful
as you can, and then shut off the discussion and release them.

****************************************************************

From: Martin Dowie
Sent: Wednesday, February 11, 2004  8:22 AM

> I want Ada.Containers, and I will use them.  Make them as good and powerful
> as you can, and then shut off the discussion and release them.

I'd second that.

What is currently proposed is admittedly limited but it
would be useful. If Matt could adapt "Charles" into the
core of the secondary standard that would be great too.

****************************************************************

From: Matthew Heaney
Sent: Wednesday, February 11, 2004  9:36 AM

That is indeed the plan.  The current proposal has only a modest set of
containers but we have to start somewhere.

If you need something right away, there is a reference implementation
available at my home page.

<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040211.zip>

****************************************************************

From: Matthew Heaney
Sent: Wednesday, February 11, 2004  9:44 AM

I think you'll find that in spite of its modest size, the containers in
the current proposal are indeed very, very useful.

See in particular the !examples section in the AI itself.

<http://www.ada-auth.org/cgi-bin/cvsweb.cgi/AIs/AI-20302.TXT?rev=1.1>

The reference implementation contains several examples, too.

****************************************************************

From: Martin Krischik
Sent: Wednesday, February 11, 2004  12:38 PM

> The strategy proposed by the committee was to standardize something like
> AI-302-3, and encourage the development of a secondary standard (at a more
> leisurely pace!) to handle creating additional containers to provide
> additional functionality, both performance related (bounded forms, lists,
> etc.), functional (sorted_maps, unsorted_sets, etc.), and operational
> (indefinite keys, indefinite elements, limited elements). We hope that
> providing a standard root will channel future developments in a common
> direction, rather than the scattershot approach that's currently prevalent.

Ok, you are right there. I can easiely live with "indefinite later" - to name
my pet feature - however some expressed an "indefinite never" stand and I
can't live with that.

****************************************************************

From: Marc A. Criley
Sent: Wednesday, February 11, 2004  2:23 PM

I fear the participants on this list are rather detached from the "average
Ada programmer" experience.

Of all the dozens of Ada programming _coworkers_ I've worked with over the
years, I could count on one hand (and not even need all the fingers) the
number that would know or care what Charles or PragmArc are (much less
something called an "ARG"), and those few who'd heard of Booch would recall
it as just something that had been used in the early days.

Where are the journeyman programmers for whom Ada is just the language they
write code in going to find data structures?  If it doesn't show up in the
reference manual, it'll be borrowed from some home- or project-grown thing
that was done before, get ginned up yet again from scratch, or maybe get
copied out of a dog-eared Ada textbook.

Meanwhile, the C++ programmers have got the STL handed to them on a platter,
and the Java programmers have got their big JDK posters and Javadocs with
all those containers documented and ready to use.

But for the Ada programmer that just clocks in, codes, and goes home to
their family, nothing.

****************************************************************

From: Pascal Obry
Sent: Wednesday, February 11, 2004  2:59 PM

 > I fear the participants on this list are rather detached from the "average
 > Ada programmer" experience.

This is not about average something or not. Just that I'm using Ada in the
Information Technology domain. I don't really care(1) about size or
performances, this is not hard real-time nor embedded applications. In the IS
field we need a decent container libraries to speed-up developement. What I'm
saying is that if the container library is not complete I'll use something
else. And since people on the embedded or real-time field are certainly not
going to use the standard containers but most probably some simpler version
hand-coded for the application I'm a bit concerned about the current path...

Ada is *not only* an embbeded real-time programming language!

Pascal.

(1) I did not say that I want quick and dirty code :)
(2) BTW, I'm not sure to be an average Ada programmer :)

****************************************************************

From: Robert A. Duff
Sent: Wednesday, February 11, 2004  3:06 PM

> Ok, you are right there. I can easiely live with "indefinite later" -
> to name my pet feature - however some expressed an "indefinite never"
> stand and I can't live with that.

I don't remember anybody saying "indefinite never", but anyway, *my*
opinion is that a secondary standard containing a rich variety of stuff,
including all the bells and whistles that various folks have asked for,
including indefinite component types, would be a Good Thing.

But somebody has to take charge and push such a secondary standard
through.  I'm not volunteering.  ;-)

****************************************************************

From: Jeffrey Carter
Sent: Wednesday, February 11, 2004  6:02 PM

The only problem is that there doesn't seem to be any mechanism for such
a secondary standard. Indeed, the intial call for proposals for the
standard container library indicated that it was for a secondary
standard, but it is intended to become part of the ARM now.

****************************************************************

From: Randy Brukardt
Sent: Wednesday, February 11, 2004  10:42 PM

The intent is to use a new ISO procedure called an "International Workshop
Agreement". These get published essentially immediately (no lengthy approvals),
and then can later be turned into real standards if that proves to be a good
idea.

But, as Bob mentioned, there have to be people to drive that "Workshop"
(which doesn't need to be an actual workshop per-se).

****************************************************************

From: Jean-Pierre Rosen
Sent: Thursday, February 12, 2004  2:57 AM

There is such a mechanism: it is called an International Workshop Agreement. It
is a relatively new ISO procedure, giving official status (though not formally
Standard) to a specification for which there is consensus. Such an IWA may
become later a full-fledged standard.

Since it is new, nobody really knows how this works, and whether vendors would
feel bound to providing packages defined by IWA. But the mechanism is here.

****************************************************************

From: Jeffrey Carter
Sent: Thursday, February 12, 2004  12:49 PM

OK. How do we get such a "workshop" set up? I hope it's obvious that I'm
willing to participate.

****************************************************************

From: Jean-Pierre Rosen
Sent: Friday, February 13, 2004  4:44 AM

It is an ISO process, therefore you should get in touch with Jim Moore.
Of course, the first thing is to have a conveynor. If you step forward...

****************************************************************

From: Matthew Heaney
Sent: Wednesday, February 11, 2004  9:02 AM

As an example of the approach Bob is advocating here, I have included
two examples in the latest reference implementation.

<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040211.zip>

The two new examples are for a vector of indefinite elements and a set
of indefinite elements.

Both were implemented as a thin layer on top of the vector and set
containers provided by the library itself.

Neither one took very long to write (in fact I did it while watching an
episode of The Simpsons).

In the indefinite set example, I use the nested generic package
Generic_Keys, and its nested generic package Generic_Insertion.

In the indefinite vector example, I use the library-level Generic_Sort
generic algorithm.

In the test code, I instantiate each component with type String (an
indefinite type).

Note that if you want to instantiate the component with a class-wide
tagged type T'Class, then you'll probably have to declare these
class-wide operations somewhere:

    procedure Is_Equal (L, R : in T'Class) is
    begin
       return L = R;
    end;

    procedure Is_Less (L, R : in T'Class) is
    begin
       return L < R;
    end;

and then use these as the generic actuals for "<" and "=".

****************************************************************

From: Matthew Heaney
Sent: Wednesday, February 11, 2004  8:59 AM

> But Ada hasn't got a garbage collector so there is the deallocation problem.
> Especialy when the container copied or passed around.

The latest version of the reference implementation has examples of a
vector of indefinite elements and a set of indefinite elements.

Internally both instantiate the underlying container with a simple
controlled type that manages an access object that designates element
type of the higher-level container.

See the Insert_N and Replace_Element operations in the indefinite vector
package for a brief discussion of the various tradeoffs involved.  See
also Generic_Sort2.

See also the indefinite sets package for an example of how to use the
Generic_Keys nested generic package.

****************************************************************

From: Matthew Heaney
Sent: Wednesday, February 11, 2004  9:24 AM

> -- Truc is a proof-of-concept implementation of AI-302/3
> -- for indefinite elements, i.e. indefinite generic formal
> -- element types (reorder the 4 adjectives at will).

The latest version of the reference implementation has two new examples:
one for a vector of indefinite elements and another for a set of
indefinite elements.

There is no "automatic" selection of a package.  The programmer chooses
the correct package himself, at the time of instantiation.

If he needs to store indefinite elements, then he instantiates the
package for indefinite elements.

If his element type is definite, then he has the choice of using either
the definite or indefinite packages.  The package for definite elements
will be more efficient, of course.

****************************************************************

From: Marius Amado Alves
Sent: Wednesday, February 11, 2004  9:55 AM

> There is no "automatic" selection of a package.  The programmer chooses
> the correct package himself, at the time of instantiation.

I know. I saw your code. It's fine. So one last try: how about configuring
these indefinite elements versions as the one-page specialized needs annex
below? Matt's manual choice approach has the virtue of fitting right in.

ANNEX <IE>

Containers of Indefinite Elements

This Annex provides support for containers of indefinite elements.

[Implementation Requirements]

An implementation conforming to this Annex shall have the package
Ada.Indefinite_Elements and descendants defined in this Annex.

[Static Semantics]

The specifications of the descendants of Ada.Indefinite_Elements are a copy of
the specifications of the descendants of Ada.Containers specified in A.17,
with the unique difference that, for each generic descendant of
Ada.Containers that has a definite element formal type, the corresponding
descendant of Ada.Indefinite_Elements has an indefinite formal type in its
place.

[Dynamic Semantics]

The behaviour associated with each container of Ada.Indefinite_Elements is
exactly like that defined in A.17 for the corresponding container of
Ada.Containers.

[Examples]

Specification of Ada.Indefinite_Elements.Vectors:

  generic
    type Index_Type is (<>);
    type Element_Type (<>) is private;
    with function "=" (L, R : Element_Type) return Boolean is <>;
  package Ada.Indefinite_Elements.Vectors
    -- remainder of this package exactly like that of
    -- Ada.Containers.Vectors

****************************************************************

From: Robert A. Duff
Sent: Wednesday, February 11, 2004  11:02 AM

> I just wrote the thing excerpted below.
> The whole is available at
>   http://www.liacc.up.pt/~maa/containers
> Thanks.

It seems inefficient to store *two* vectors for each vector,
and to select between them at run time, when 'Definite is generally
known at compile time.  Why not let the programmer choose to instantiate
one or the other package?

Also, this code uses 'Unrestricted_Access, which is not Ada.

****************************************************************

From: Marius Amado Alves
Sent: Wednesday, February 11, 2004  9:22 AM

> The latest version of the reference implementation has examples of
> indefinite vectors and indefinite sets, both of which can be used to
> instantiate elements of type T'Class.

Good news!

BTW, Truc (www.liacc.up.pt/~maa/containers/truc.ada) has been updated also,
with:
- a test for classwide element types too (passed:-)
- cosmetics

****************************************************************

From: Marius Amando Alves
Sent: Wednesday, February 11, 2004  12:51 PM

> It seems inefficient to store *two* vectors for each vector,
> and to select between them at run time, when 'Definite is generally
> known at compile time.

As I say in the Truc spec, going outside the language would make it optimized.
I can think of a number of ways to do so, and eliminate those ineficiencies.

Aside.
The two vectors problem could perhaps be eliminated inside the language using
tagged types (it would still be dynamic dispatching though, i.e. a runtime
choice). I tried that but Ada got in my way and so I solved the problem
quickly and dirtly.

Anyway it is not a big inneficiency in practice because only one vector is
used, and the other never used not even initialized vector has neglectable
space and zero time impact.
End of aside.

>  Why not let the programmer choose to instantiate
> one or the other package?

Staying within Ada, yes, that is better, and supports my suggestion to put the
indefinite variants in a separate package branch defined in a specialized
needs annex. And using the already existing reference implementations by Matt
(released today).

> Also, this code uses 'Unrestricted_Access, which is not Ada.

You've got me, I'll have to change the 100% Ada claim to 99% :-)

I used 'Unrestricted_Access instead of the Rosen trick because element types
must be non-limited. Maybe there's another way, but I couldn't think of it.

Aside.
I used AI302.vectors of stream elements to avoid doing memory management, in
one more experiment in pointerless programming. And for other reasons. For
example programming for persistency: I can easily get a persistent container
just by changing the stream operations.

I needed write access to an in container because of this stream approach. Now
I'm curious if Matt's implementation has this and how he did it.
End of aside.

****************************************************************

From: Robert A. Duff
Sent: Wednesday, February 11, 2004  3:16 PM

Marius Amado Alves wrote:

> I know. I saw your code. It's fine. So one last try: how about configuring
> these indefinite elements versions as the one-page specialized needs annex
> below? Matt's manual choice approach has the virtue of fitting right in.

I like this idea, but I don't think it should be in a Specialized Needs
Annex (i.e. optional for implementers to support it).  The problem is
not that it's hard to support, but that it adds extra verbiage to the
RM.  You've shown, I think, that the extra verbiage could be pretty
small.

We compiler writers can probably even get Matt to code up the
implementation for us.  ;-)

This idea is much better than a magic package that supports both
definite and indefinite efficiently.

****************************************************************

From: Robert A. Duff
Sent: Wednesday, February 11, 2004  3:35 PM

True.

However, I think going outside the language is a bad idea.
I say: An efficient implementation should be possible in pure Ada.
As an implementer, I have no intention of adding compiler magic for this
stuff -- I want to be able to just write pure Ada code (or, better yet,
take advantage of Matt's work).

Even Address_To_Access_Conversions makes me nervous -- yeah, it's Ada,
but it's rather ill-specified.

> I can think of a number of ways to do so, and eliminate those ineficiencies.
>
> Aside.
> The two vectors problem could perhaps be eliminated inside the language using
> tagged types (it would still be dynamic dispatching though, i.e. a runtime
> choice). I tried that but Ada got in my way and so I solved the problem
> quickly and dirtly.
>
> Anyway it is not a big inneficiency in practice because only one vector is
> used, and the other never used not even initialized vector has neglectable
> space and zero time impact.
> End of aside.

I don't want users of definite types to pay *any* penalty caused by
supporting indefinite types.

> >  Why not let the programmer choose to instantiate
> > one or the other package?
>
> Staying within Ada, yes, that is better, and supports my suggestion to
> put the indefinite variants in a separate package branch defined in a
> specialized needs annex. And using the already existing reference
> implementations by Matt (released today).

As I said in my previous message, that suggestion seems reasonable,
except for the cialized needs annex" part.  For portability, we
don't need more optionally-supported features of Ada.

On the other hand, maybe support for indefinite is just too much
(for the Ada RM -- of course a secondary standard should support
all bells and whistles).

> > Also, this code uses 'Unrestricted_Access, which is not Ada.
>
> You've got me, I'll have to change the 100% Ada claim to 99% :-)

OK, but 99% isn't good enough.  I want these packages to implementable
in 100% pure Ada.  If that's not possible (as in the defaulted-discrims
case somebody mentioned) we need to change the language to *make* it
possible.

> I used 'Unrestricted_Access instead of the Rosen trick because element
> types must be non-limited. Maybe there's another way, but I couldn't
> think of it.

Well, I can think of ways involving "for X'Address use.." or
Address_To_Access_Conversions, and I might be willing to live with that,
but I don't like it.

I didn't read your code carefully enough to understand whether
'Unrestricted_Access was really needed.  Why not declare the thing
aliased, and use 'Unchecked_Access?

> Aside.
> I used AI302.vectors of stream elements to avoid doing memory management, in
> one more experiment in pointerless programming. And for other reasons. For
> example programming for persistency: I can easily get a persistent container
> just by changing the stream operations.
>
> I needed write access to an in container because of this stream
> approach. Now I'm curious if Matt's implementation has this and how he
> did it.  End of aside.

Is it not possible to allocate the indefinite thing in the heap,
and still know when it needs to be freed?  I don't like memory leaks...

****************************************************************

From: Randy Brukardt
Sent: Wednesday, February 11, 2004  4:24 PM

> I like this idea, but I don't think it should be in a Specialized Needs
> Annex (i.e. optional for implementers to support it).  The problem is
> not that it's hard to support, but that it adds extra verbiage to the
> RM.  You've shown, I think, that the extra verbiage could be pretty
> small.

I actually was going to suggest the same thing, given that the wording
needed is roughly the same supporting a "wide_string" version of something
given a "string" version.

I'll put it as an Open Issue in the "bug fix" update of the AI. (I don't
want to make major changes to the AI, because I don't want to present a
moving target to the ARG members who are supposed to be studying it for the
upcoming meeting...)

****************************************************************

From: Matthew Heaney
Sent: Wednesday, February 11, 2004  4:57 PM

I have several places in the reference implementation that I've notated
with "NOTE", places in the AI where the semantics aren't exactly
specified, where there's disagreement, where there can be improvement,
etc.  Should I send you a list or something?  When would you like me to
do that?

****************************************************************

From: Randy Brukardt
Sent: Wednesday, February 11, 2004  5:21 PM

Sure, do that. Any time is fine, but no later than the start of next week.

****************************************************************

From: Jeffrey Carter
Sent: Wednesday, February 11, 2004  5:56 PM

>>The specifications of the descendants of Ada.Indefinite_Elements are a copy of
>>the specifications of the descendants of Ada.Containers specified in A.17,
>>with the unique difference that, for each generic descendant of
>>Ada.Containers that has a definite element formal type, the corresponding
>>descendant of Ada.Indefinite_Elements has an indefinite formal type in its
>>place.

This doesn't seem quite right. The containers all have an Element_Type
formal, so it can specify that type. Maps have a Key_Type that should
also be indefinite, so it should specify it as well.

However, this seems like a good way to add support for indefinite
elements to the proposal. If only adding additional containers could be
this easy!

****************************************************************

From: Marius Amado Alves
Sent: Wednesday, February 11, 2004  5:46 PM

>I'll put it [Annex <IE>] as an Open Issue in the "bug fix" update of the AI.

Great!

Annex <IE> is all it takes to make the proposal "complete". It is
already fairly complete with respect to structural varieties (vector,
set, map). What it is really missing is element type varieties
(definite, indefinite). The group (definite, indefinite) has *exactly*
the same properties as (vector, set, map). Primitive (in the good sense
of course), concise, complete, useful. (Definite, indefinite) as opposed
to (definite, indefinite, tagged, limited, abstract...), like (vectors,
set, map) vs. (vector, set, map, queue, list...), these two oppositions
are in perfect alignment. The extra things in each latter group can be
realised with the ones in the former. The proposal is complete only if
it is complete along at least these two axes (structural variety,
element type). The other axes--size, persistence--are of lesser impact.
It does not offend me at all to have them set to a fixed point in the
standard--unbounded, core memory--, and extend them in secondary
standards--(fixed, bounded, unbounded...), (core, cache, file...) A nice
simetry. Container space has 4 axis (structure, element type, size,
persistence). Aternative 3 with Annex <IE> ranges over 2, and fixes a
point in the other 2. The ranges and points defining the most primitive
region. The standard region. I promise this is my last motivational
rambling for indefinite elements. I needed to have a view of the whole,
evidently I used my "system of coordinates", and I thought I might share
it with you.

Talking of Open Issues: the range vs. discrete index issue. I'd say
range. It solves the problem of Assert failing on enumerations. And the
use of an enumeration for the index of a *variable* length vector does
not make much sense. Ditto for modular types.

****************************************************************

From: Robert A. Duff
Sent: Wednesday, February 11, 2004  3:40 PM

One thing that disappoints me about the current containers proposal is
that there's no way to control memory allocation.  The C++ STL allows
the client to define which storage pool should be used.  Would it be
possible for us to do the same, without burdening users who just want to
use "the regular heap"?

****************************************************************

From: Randy Brukardt
Sent: Wednesday, February 11, 2004  4:17 AM

I don't think so. There were proposals offered for naming the standard
storage pool(s) and allowing defaults for generic formal parameters, but
both of those died an early death. (See AI-299 and AI-300.) Those were aimed
at solving this problem. Since we're not reintroducing existing, killed
proposals (and certainly the need for them in containers libraries was well
explained when first considered - there's no "new information" here), it
would have to be done without them.

That means that about the only way to do it would be with an access type
(with "null" meaning use the default pool). That seems very ugly to me,
especially as you would have to make the pool that you want to pass in
"aliased". And there doesn't seem to be a good place for that access type to
live.

In any case, that strikes me as creeping featurism.

****************************************************************

From: Matthew Heaney
Sent: Wednesday, February 11, 2004  4:34 PM

You could do something like this:

generic
    type Element_Type is private;
    Pool : in out Root_Storage_Pool'Class;
    with function "=" (L, R : ET) return Bool is <>;
package Ada.Container.Vectors is ...;   //for ex.

There are several problems:

(1) The language standard doesn't specify any storage pool objects.  I
suppose that the standard library could define a few default pool
objects, though.


(2) Even if you do have a pool then you run into problems with static
matching rules, since the generic formal pool type is T'Class, which
doesn't match a specific type NT in T'Class.  So you have to resort to
hacks like:

package My_Pools is

    My_Pool : My_Pool_Type;  --derives from RSP

    My_Pool_View : Root_Storage_Pool'Class
       renames Root_Storage_Pool (My_Pool);

end;

and then use My_Pool_View as the generic actual pool object.

(3) It's in conflict with our design principle that components be easy
to instantiate and use.  I would love to have a generic formal pool
object default a la "is <>" or "is <name>", but the language doesn't let
you specify defaults for generic formal objects.

(4) You might be able to get around (2) by declaring a generic formal
derived type:

generic
    type ET is private;
    type Pool_Type is new Root_Storage_Pool with private;
    Pool : in out Pool_Type;
package Ada.Containers.Vectors is ...;

but then this is in conflict with (3), because now there's another
formal type (which cannot be defaulted).

In C++ generic formal pool objects ("allocators") are allowed to have a
default, by constructing an allocator on-the-fly.  But then there's some
rule about the STL that requires allocator objects be shared (or
something like that)???  And then it complicates things for implementors
because you have to use the "empty virtual base class" trick to avoid
allocating padding for otherwise empty classes.

Realize that adding custom allocator support to the STL complicated the
semantics somewhat (do objects have the same or different allocators? --
affects assignment rules, etc).

An early version of Charles allowed you to pass in a storage pool, but I
eventually gave it up because it was too many headaches for casual users
who didn't care about supplying their own pool.

If you've studied my reference implementation then you might have
noticed that the substrate package (e.g. charles.red_black_trees) used
to implement the higher-level container is written so that all the
allocation and deallocation is done by the higher-level package.   The
substrate package is completely agnostic about how storage allocation
gets done.  This allows the user of the instantiation of the red-black
tree (say) to use a pool if he wants, or indeed even statically allocate
the nodes.  In fact the container elements can even be limited.  The
substrate package doesn't care.  All the ugliness is hidden from the
container user by the wrapper container package.

So I reached the conclusion that if someone (like, um, Bob Duff, who has
written lots of custom storage pools) needs a special sorted set that
uses some fancy storage pool, then it's not too hard to do that using
the substrate package directly and building his own wrapper class.

Perhaps there is a way to do this.  It may be that there's some slick
language trick that I haven't figured out that would allow the user to
pass in his own pool without too much pain at instantiation time.

There is also the multi-threading issue.  Clearly the user has the
responsibility to not allow concurrent access to the same container
object, but what about different threads each manipulating their own
container object, so we have multiple container objects (and hence
multiple threads) sharing a common pool object?  But I suppose you could
use the same synchronization mechanism you use for alligator new.

Maybe you could make some other (non-limited) abstraction, and pass that
in as the default, e.g.

    generic
       type ET is private;
       Pool : Pool_Handle := Default_Pool;  --from somewhere
    package Generic_Containers is ...;

but the language doesn't give you anything like the placement new
construct in C++, which allows you to construct an object in-place, at a
location you specify.

There is a sort of hack you can do by declaring a pool object
on-the-fly, that binds to an object (in some raw form e.g. storage
elements) to be constructed using an access discriminant.  Then you make
a dummy call to new, and internally the pool object specifies the
address of the object (to which the pool object is bound) as the address
return value.  The run-time system will then call Initialize on that
object.  Placement new in Ada95!  But that's kind of a trick and I don't
really know if it will work.

****************************************************************

From: Tucker Taft
Sent: Wednesday, February 11, 2004  9:31 AM

I believe the intent is that all of these containers
use controlled types to avoid storage leakage, analogous
to what unbounded strings do.  (In fact, I could imagine
that a vector and an unbounded string would have a lot
in common under the covers.)

So I'm not sure how a user-defined pool would interact
with that (and I fear based on our own experience that
putting finalizable things in user-defined pools can be
tricky).

Note that this will give more incentive for implementors
to "sharpen up" their implementation of controlled types.
I think that is a good thing, so we are spending energy
improving existing features of the language, rather than
dissipating energy on lots of different ways of skinning
the same cat.

****************************************************************

From: Tucker Taft
Sent: Wednesday, February 11, 2004  5:13 PM

Matthew Heaney wrote:
> ...
> (3) It's in conflict with our design principle that components be easy
> to instantiate and use.  I would love to have a generic formal pool
> object default a la "is <>" or "is <name>", but the language doesn't let
> you specify defaults for generic formal objects.

I generally agree with your reasoning, but this particular statement
is false, unless you say "formal IN OUT objects."  Formal IN objects
certainly can have defaults, and the way to pass in a storage pool
would be as Randy suggested, via an access value.  The default could
be implementation-defined, with the semantics that it implies the
standard storage pool, if allowed to default.

But I still believe my earlier response, that mixing user-defined
storage pools and controlled types is asking for complexity, and
doesn't seem to buy enough to justify itself.

One reason to have a user-defined storage pool is to do some kind
of garbage collection, or to do mark/release.  Both of those could
easily interfere with the implementation of controlled
types, unless the user was very careful, and had a pretty good idea about
how controlled types were implemented.

****************************************************************

From: Simon J. Wright
Sent: Thursday, February 12, 2004  3:12 AM


Marc A. Criley wrote:

> Where are the journeyman programmers for whom Ada is just the
> language they write code in going to find data structures?  If it
> doesn't show up in the reference manual, it'll be borrowed from some
> home- or project-grown thing that was done before, get ginned up yet
> again from scratch, or maybe get copied out of a dog-eared Ada
> textbook.

For a project of any size I would not expect journeyman programmers
to be making this sort of choice; it should be a matter of policy set
by the software architect(s), along with "how we use tasks", "how we
deal with exceptions" etc.

So the question is, where do the architects find stuff? and clearly
the ARM is a very good start (though I have to admit there are parts
of Annex A that I'm not at all familiar with and should be!
Strings.Maps, for example).

I started maintaining the BCs because I needed containers for a demo
project. Although it has been fun, I would never have done so if the
proposed library had been available, and I strongly support it.

****************************************************************

From: Marc A. Criley
Sent: Thursday, February 12, 2004  7:46 AM

Speaking as a software architect for both Ada and C++ projects, your
characterization of this aspect of the architect's job is quite correct.
One of my sub-tasks was ensuring that only the authorized container classes
were being used, even to the point of once having to threaten to reject a
developer's code if he didn't start using STL instead of coding up his own
comparable classes.

However, I've had plenty of experience with other architects and leads who
aren't on this mailing list, don't visit comp.lang.ada (or comp.lang.c++),
aren't on the Team-Ada mailing list, don't subscribe to any technial
magazines or journals, aren't in ACM, don't go home and code at night or on
weekends, and own only one book covering each programming language they have
to deal with, whether it be Ada, C++, Java, Perl, etc.

When they need container classes, they check the project's code base or they
go to their books, which for Ada does usually include the reference manual.
That's where container  availability needs to be publicized, because if they
do find something on the Web (like Booch or Charles or PragmArc), they need
to first expend the effort to convince themselves that that "home-grown"
collection is something that would be useful, and then overcome most
management's resistance to using "free", unsupported software.  Making
Ada.Containers part of the standard language distribution gives it a cachet
of legitimacy that means that architects/leads don't have to fight that
fight.  And they end up with a container collection that will handle the
needs of most non performance critical projects.

****************************************************************

From: Stephen Leake
Sent: Thursday, February 12, 2004  11:48 AM

Just to state my position for the record:

I like AI-302-3.

I think the rationale for the design needs to be more clearly stated
(particularly why the key and element types are definite).

Examples of how to build packages supporting indefinite types would be
good; an actual standard for that (layered on top of the current one)
would be better, but I can wait for that.

I like the term "cursor" instead of "iterator"; "iterator" is clearly
overloaded, while "cursor" matches the usage in SQL.

Since these packages are intended to be low-level building blocks, I'd
rather see them called "unbounded_array", "hashed_map", and
"sorted_tree". But that's a small issue.

****************************************************************

From: Matthew Heaney
Sent: Thursday, February 12, 2004  3:41 PM

See the latest reference implementation (Thu, 12 Feb 2004) for examples
of using the canonical containers to implement indefinite vectors and
indefinite sets.

<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040212.zip>

I'll have at least one example of a map of indefinite elements tomorrow.


> I like the term "cursor" instead of "iterator"; "iterator" is clearly
> overloaded, while "cursor" matches the usage in SQL.

The Iterator design pattern described in the Gamma book says that
"Cursor" is an alias for the term "Iterator", so you seem to be in good
company.


> Since these packages are intended to be low-level building blocks, I'd
> rather see them called "unbounded_array", "hashed_map", and
> "sorted_tree". But that's a small issue.

Low-level is a point of view.

It's a vector implemented as an unbounded array, not an unbounded array
per se.

It's a map, implemented using a hash table, but not a hash table per se.

It's a sorted set, implemented using a balanced (red-black) tree, but
not a tree per se.

Yes, they're building blocks.  Yes, they're low level.  But they're not
as low-level as unbounded arrays, hash tables, and red-black trees.

****************************************************************

From: Stephen Leake
Sent: Friday, February 13, 2004  8:19 AM

> > "sorted_tree". But that's a small issue.
>
> Low-level is a point of view.
>
> It's a vector implemented as an unbounded array, not an unbounded
> array per se.

Hm. Let's compare Ada.Containers.Vectors to SAL.Poly.Unbounded_Array.

Vectors has Insert in the middle, Sort, and Element_Access. SAL allows
indefinite and limited items. Otherwise they are the same.

Sort is a reasonable operation for any container; I would put it in a
child package, since many applications won't need it.

I guess that means SAL.Poly.Unbounded_Array is actually a "vector"?
What would a true low-level unbounded_array look like?

> It's a map, implemented using a hash table, but not a hash table per se.

I need to see your definition of "hash table"; this looks like one to me.

> It's a sorted set, implemented using a balanced (red-black) tree, but
> not a tree per se.

This one I'll grant you; it is more complex than just a tree.

> Yes, they're building blocks.  Yes, they're low level.  But they're
> not as low-level as unbounded arrays, hash tables, and red-black trees.

As long as the names are sufficiently clear, and it is clear how to
name new components that complement these, I'm happy. As I see it, all
of these names have loose enough definitions that this issue is _not_
a show stopper.

****************************************************************

From: Matthew Heaney
Sent: Thursday, February 12, 2004  9:33 AM

> Yup. That's precisely how Type'Size works in Ada; it has a fairly weak
> effect on Obj'Size, but in any case, if you set it, you have to return the
> same value (even if that value has nothing to do with how objects are
> actually stored).

The Size function is analogous to the capacity() member function in the
STL vector class.

The Resize procedure is analogous to the reserve() member function.

A vector container is implemented internally as a contiguous array, that
expands as items are inserted into the container.

The Size function returns the length of the internal array.  The Length
function returns the number of elements in the array that are "active,"
that have actually been inserted into the vector.

At all times a vector satisfies the invariant that

    Length (V) <= Size (V)

The procedure Resize tells the vector to expand to at least the size
specified in the call.

If the current size is equal to or greater than the value specified,
then Resize does nothing.

If the current size is less than the value specified, then the internal
array is expanded.  The standard does not specify the exact algorithm
for expansion, and only requires that the Size function return at least
the value specified.

There's nothing special an implementation needs to do to keep track of
the current value of the size, since it has that information already:
it's just the result of the 'Length attribute for the internal array.

>>Resize is an appropriate name for the operation as specified. I expect
>>an operation named Resize to cause resizing. If we're really talking
>>about giving the implementation a hint about an appropriate size, then
>>not only does the specification need to be changed, the name also needs
>>to be different (perhaps Size_Hint?).

The semantics for Resize are described above.

> I don't see a strong need to change the name, but I do agree with you that
> there shouldn't be a *requirement* to do some allocation.

There is a requirement for allocation only if the current size is less
than the size specified in the call to Resize.

****************************************************************

From: Matthew Heaney
Sent: Thursday, February 12, 2004  9:58 AM

> We compiler writers can probably even get Matt to code up the
> implementation for us.  ;-)

Ask, and you shall receive...

The latest version (12 Feb 2004) of the reference implementation has an
example of a sorted map, implemented using the sorted set and by
instantiating its nested generic package Generic_Keys.

There are also two examples of hashed sets, for both definite and
indefinite elements.  This standard doesn't have a hashed set but if it
did then this is what it would look like.

<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040212.zip>

****************************************************************

From: Stephen Leake
Sent: Thursday, February 12, 2004  11:55 AM

> Yup. That's precisely how Type'Size works in Ada; it has a fairly weak
> effect on Obj'Size, but in any case, if you set it, you have to return the
> same value (even if that value has nothing to do with how objects are
> actually stored).

I think that's a bad idea. It means my scenario (preserve the max size
of a container in a config file, and set it next time on startup)
won't work. If I set the max size from yesterday on startup today, but
then the container grows larger today, when I exit and query the size,
I won't get the larger correct size, but just the one I set at the
begining.

Setting the size should be a hint, but only for a starting point.
Querying the size should always return the current size.

****************************************************************

From: Jeffrey Carter
Sent: Thursday, February 12, 2004  12:47 PM

Randy Brukardt wrote:

> Yup. That's precisely how Type'Size works in Ada; it has a fairly
> weak effect on Obj'Size, but in any case, if you set it, you have to
> return the same value (even if that value has nothing to do with how
> objects are actually stored).

Not quite precisely. There are cases where a compiler is required to use
the specified 'Size.

> One allocation per key is a lot more than one allocation per *map*,
> which is what a stringspace implementation takes. (Well, it might
> have to expand if it gets full, but that should be rare. It could
> degrade to one allocation per key if the keys are very, very long,
> but some care in implementation should prevent degrading.)

OK. I misunderstood.

> I have a component like that (it's actually Tom Moran's), but in
> practice, I've *never* used it without using the index values it
> provides to manage some other data in a separate table (at least
> statistics and/or debugging). Even the 'known words' list in the spam
> filter uses the indexes (handles) for debugging. If that's the case,
> why bother having to use a separate component (causing another chance
> of error)?
>
> So I would guess that the "dummy type" would gain some real data in
> 95% of the applications. And that such uses are less than 10% of the
> uses of a map anyway. Since this is a minimal library, we're not
> trying to cover that remaining 0.5%.

It's not "another component"; it's the underlying implementation of the
hashed map component. My point is that we're requiring the
implementation of a hash table, which is a useful component, but not
requiring that it be provided to users. That's like requiring that a
compiler be able to convert strings into numbers, but not having 'Value
in the language. It doesn't require any additional work by implementors,
nor introduce an additional opportunity for errors, but it does increase
the utility of the library.

To me it's a no brainer, as is converting the map part of the "sorted
set" component (Generic_Keys) into its own component: it's no additional
work for implementors, and allows the user to obtain a sorted map with a
single instantiation, instead of 2.

Put another way, the library has 2 different approaches to defining a
map. In one, we have a map component and hide the underlying
implementation (the hash table). In the other, we have the "sorted set"
component, and then a map component implemented in terms of it. We
should at least be consistent, and I argue that our consistency take the
form of providing both the underlying implementation and the map
implemented in terms of it.

****************************************************************

From: Matthew Heaney
Sent: Thursday, February 12, 2004  3:21 PM

> It's not "another component"; it's the underlying implementation of the
> hashed map component. My point is that we're requiring the
> implementation of a hash table, which is a useful component, but not
> requiring that it be provided to users.

A hash table might be at the wrong level of abstraction (too low).  The
hashed map actually takes the level of abstraction up a notch.

In my original proposal, I allowed the user to query each bucket of the
underlying hash table array, but the subcommittee rejected that approach
as too low level, in favor of higher-level First and Succ active
iterator operations.

****************************************************************

From: Matthew Heaney
Sent: Thursday, February 12, 2004  3:43 PM

> Setting the size should be a hint, but only for a starting point.
> Querying the size should always return the current size.

Yes, querying the size should always return length of the internal array.

If the value specified in the call to Resize is larger than the current
length of the internal array, then the internal array is expanded to at
least the length specified.

****************************************************************

From: Simon J. Wright
Sent: Thursday, February 12, 2004 10:42 AM

> The Size function is analogous to the capacity() member function in the
> STL vector class.
>
> The Resize procedure is analogous to the reserve() member function.

...

Do we really need these operations? I presume that they support
optimisation by allocating extra space ahead of time -- do our users
really need that? (assuming of course that the vector will resize
itself if it finds it needs to).

****************************************************************

From: Matthew Heaney
Sent: Thursday, February 12, 2004  11:10 AM

Yes, it makes the optimization you describe -- the Resize preallocates
an internal array large enough to contain all future insertions.

The optimization is important especially for very large numbers of
elements.

But don't take my word for it.  Measure the performance of this procedure:

procedure Not_Optimized (V : in out Vector_Type) is
begin
    for I in 1 .. 1_000_000 loop
       Append (V, New_Item);
    end loop;
end;

and then compare it to this one:

procedure Optimized (V : in out Vector_Type) is
begin
    Resize (V, Size => 1_000_000);

    for I in 1 .. 1_000_000 loop
       Append (V, New_Item);
    end loop;
end;

If you really want to see a difference then use a complex element type,
perhaps one that is controlled and does lots of internal allocation.

I know it makes a difference because I've actually had the problem.  In
my streaming media server, when a file is requested I must load large
indexes comprising several hundred thousand elements that describe the
frames in the file (these are 2 hour movies).

When I first wrote the server there was a huge spike in the CPU monitor
whenever I loaded a file, and this tended to disrupt existing streaming
clients.  (This is a real-time streaming media server, and I have to
service several hundred clients simultaneously.)

I did some analysis and realized is was population of the index vector
that was the cause of my problem.  So I just figured out my total number
of indexes before inserting and then did a Resize.  And now all it well.

So performance matters, and therefore we should keep Size and Resize.
Of course, if your vector objects are small, or you don't have any
special performance needs, then you can just ignore Resize and the
vector will work fine.

****************************************************************

From: Robert A. Duff
Sent: Thursday, February 12, 2004  12:01 PM

I would expect the former to about lg(1_000_000) = 20 allocations,
and the latter to do 1 allocation, presuming the growth is exponential,
which it should be.  (E.g. double the size each time you run out
of space.)

> So performance matters, and therefore we should keep Size and Resize.

I agree.  I use a similar growable array abstraction quite heavily in my
current project, and there are cases where the code knows the size ahead
of time (or can guess), and I care enough about speed to do the Resize.

****************************************************************

From: Alexandre E. Kopilovitch
Sent: Thursday, February 12, 2004  3:29 PM

> Yes, but access types themselves are not tagged. What they point at is irrelevant.
> If you have a formal "type T is tagged private;" no access type will match
> that; it's the same for interfaces.

Still don't understand: if we can do something useful with, say, an array
(or Unbounded_Array) of interface objects then why can't do the same with
an array of accesses to interface objects - just dereferencing them before
calling a member of the interface?

****************************************************************

From: Randy Brukardt
Sent: Thursday, February 12, 2004  4:58 PM

Because you can't create a container (a map, say) of access types in this
model. Remember, an interface has no implementation, so at some point you
have to have a concrete implementation.

Let me try to give a very simple example:

(* Warning *) This is not a serious proposal! (* End Warning *)

   package Ada.Containers is
       type Element_Interface is interface;
       -- Any element operations here (I don't think there need to be any).

       type Cursor_Interface is interface;
       -- Any common cursor operations here.
   end Ada.Containers;

   package Ada.Containers.Interfaces is
       type Forward_Iterator_Container_Interface is interface;
       function Null_Cursor (Container : Forward_Iterator_Container_Interface) return
            Cursor_Interface'Class is abstract;
       function Front (Container : Forward_Iterator_Container_Interface) return
            Cursor_Interface'Class is abstract;
       procedure Increment (Container : Forward_Iterator_Container_Interface;
            Cursor : in out Cursor_Interface'Class) is abstract;
       function Element (Container : Forward_Iterator_Container_Interface;
            Cursor: Cursor_Interface'Class) return Element_Interface'Class
                 is abstract;
       ...
       -- (It might make more sense to put the "iterator" operations on the Cursor_Interface.
       -- But then you'd need a separate interface just for element access through a cursor.)
   end Ada.Containers.Interfaces;

   with Ada.Containers.Interfaces;
   generic
        type Key_Type is private;
        type Element_Type is new Element_Interface;
        ... -- As before
   package Ada.Containers.Maps is
        type Map_Type is new
           Ada.Containers.Interfaces.Forward_Iterator_Container_Interface
               with private;
           -- Of course, other useful interfaces also would be included here.
           -- Probably including a "map" one.
        type Cursor_Type is new Cursor_Interface with private;

        ... -- As before. (With appropriate Null_Cursor and Increment routines).
   end Ada.Containers.Maps;

Now, to use this, the element type has to 'have' the Element_Interface
interface:
    type My_Element_Type is new Ada.Containers.Element_Interface ... with ...;
You can't instantiate the container with a scalar type or an access type or an
array type or any record that doesn't have the Element_Interface interface.

Now, the point of all of this is that you now can write an iteration routine
that will work for any container having the
Forward_Iterator_Container_Interface. For instance, to create a passive
iterator, you could do (this of course isn't useful, but the ability to write
such things is):

    generic
        with procedure Process (Element : in Element_Interface'Class);
    procedure Iterator (Container : Forward_Iterator_Container_Interface'Class);
    procedure Iterator (Container : Forward_Iterator_Container_Interface'Class) is
        Current : Cursor_Interface'Class := Front (Container);
    begin
        while Current /= Null_Cursor (Container) loop
             Process (Element (Container, Current));
             Increment (Container, Current);
        end loop;
    end Iterator;

Moreover, the instantiations are pretty much the same as the current
proposal. But the element types are limited to tagged types.

****************************************************************

From: Ehud Lamm
Sent: Thursday, February 12, 2004  2:43 AM

But signature packages would work ok, wouldn't they?

****************************************************************

From: Randy Brukardt
Sent: Thursday, February 12, 2004  5:06 PM

Signature packages violate the meta-rule about ease of instantiation: as few
instantiations as possible to get a usable container. (That's one
instantiation, of course.) As far as I can tell, to use them like
interfaces, they'd have to be a parameter to the generic container package.

But perhaps you had something else in mind.

In any case, I don't like signature packages. They add layers of overhead on
a generic sharing implementation (every generic package has a cost, the more
you use, the more that cost is), turning the performance of pretty much
anything into that of bad Java code. (That's not a problem if the signature
doesn't contain anything "expensive", but trying to define that - and work
around it - is a fool's game.)

****************************************************************

From: Randy Brukardt
Sent: Friday, February 13, 2004  12:13 AM

Jeffrey Carter:

> > Yup. That's precisely how Type'Size works in Ada; it has a fairly
> > weak effect on Obj'Size, but in any case, if you set it, you have to
> > return the same value (even if that value has nothing to do with how
> > objects are actually stored).
>
> Not quite precisely. There are cases where a compiler is required to use
> the specified 'Size.

Not for a (sub)type. 13.3(48) says that an object's size is *at least* as
large as the specified size. Anything else said is "advice".

...
> It's not "another component"; it's the underlying implementation of the
> hashed map component. My point is that we're requiring the
> implementation of a hash table, which is a useful component, but not
> requiring that it be provided to users. That's like requiring that a
> compiler be able to convert strings into numbers, but not having 'Value
> in the language. It doesn't require any additional work by implementors,
> nor introduce an additional opportunity for errors, but it does increase
> the utility of the library.

Not true at all. Building a separate hash table component and then building
a map on top of that would be a horrible implementation performance-wise.
Lots of extra call and generic overhead. So, in practice, they'd have
completely separate implementations -- thus, you'd be doubling the work.

Moreover, the component you're describing (a hash table without elements)
wouldn't have any place to *put* elements. So I don't see how you could even
use it to implement the map. (The hash table component you're suggesting
would return a Cursor object to represent each key, but that item isn't an
index that you could use in a sequence. So how would you associate a key
from the hash table with an element? A linear list would work, but would
essentially make the hash table useless.)

What I suspect would happen in practice is that the relatively useless hash
table component would be implemented in terms of a map with a null record
element type. What's the point in that - the user can do that themselves if
they need it?

> To me it's a no brainer, as is converting the map part of the "sorted
> set" component (Generic_Keys) into its own component: it's no additional
> work for implementors, and allows the user to obtain a sorted map with a
> single instantiation, instead of 2.

Matt will tell you that the difference between a Sorted_Set using
Generic_Keys and a Map (any kind) is that the key doesn't have a separate
existence in the Sorted_Set; it's part of the element. Whereas in a Map, it
is separate from the element. There's obviously a significant space
advantage to avoiding duplicate keys.

I originally deleted the Generic_Keys component as redundant (because I too
thought it was a Map), then put it back after a discussion on C.L.A. showed
how important it is.

Matt will also tell you that he'd prefer both a Sorted_Map and a Hashed_Set,
and Tucker would tell you that he'd prefer an Unsorted_Set. And dozens of
people have asked that the List be put back. But that would quickly ballon
the proposal to double its size, and in any case smacks of "feeping
creaturism". :-)

****************************************************************

From: Matthew Heaney
Sent: Friday, February 13, 2004  8:32 AM

> Not true at all. Building a separate hash table component and then building
> a map on top of that would be a horrible implementation performance-wise.

Gulp!  I guess Randy hasn't looked at the reference implementation yet...

> Lots of extra call and generic overhead. So, in practice, they'd have
> completely separate implementations -- thus, you'd be doubling the work.

Jeff may have assumed (perhaps by looking at the reference
implementation) that implementors would implement the (hashed) map as a
layer on top of a separate generic hash table component.  But as Randy
notes, implementors won't necessarily implement the map container that
way, and so Jeff is basically advocating that another component
(specifically, a low-level hash table data structure) be added to the
standard library.

> Matt will tell you that the difference between a Sorted_Set using
> Generic_Keys and a Map (any kind) is that the key doesn't have a separate
> existence in the Sorted_Set; it's part of the element. Whereas in a Map, it
> is separate from the element. There's obviously a significant space
> advantage to avoiding duplicate keys.

What Randy told you Matt would tell you is correct...

> I originally deleted the Generic_Keys component as redundant (because I too
> thought it was a Map), then put it back after a discussion on C.L.A. showed
> how important it is.

Yes.  It allows the instantiator to take advantage of properties of the
generic actual set element type that the generic set itself isn't privy
to.  See for example the Indefinite_Sets example in the reference
implementation.

> Matt will also tell you that he'd prefer both a Sorted_Map and a Hashed_Set,
> and Tucker would tell you that he'd prefer an Unsorted_Set. And dozens of
> people have asked that the List be put back. But that would quickly ballon
> the proposal to double its size, and in any case smacks of "feeping
> creaturism". :-)

What Randy told you Matt would tell is once again correct...

****************************************************************

From: Marius Amado Alves
Sent: Friday, February 13, 2004  12:54 PM

I've updated Truc: the "100% Ada" claim is now true. The URL is the same
(www.liacc.up.pt/~maa/containers/truc.ada)

Truc features an implementation of indefinite elements using streams,
alternate to Matt's approach using controlled deallocation. This could be of
interest to implementors. But remember Truc was a proof-of-concept and is
missing many standard functions.

The other principal feature of Truc is now merely academic, that it choses
automatically the most appropriate implementation to the actual element type
w.r.t. definiteness. It is now settled that the choice will be manual (done
by the user).

****************************************************************

From: Dan Eilers
Sent: Friday, February 13, 2004  6:44 PM

I think its a little to soon to say that manual choice is settled.
Certainly it is agreed that there should not be any overhead from
support of indefinite types forced onto users of definite types.

But a user really probably prefers not to have to worry about which
flavor of each container to instantiate, just like users of
generic_elementary_functions currently don't have to explicitly
select between single and double precision versions.

You earlier proposed a language extension as an aside:
> Aside. Of course there is still no standard means to do this, but it
> would be a nice extension. Conditional compilation of generic bodies
> based on instantiation properties. Variant units :-)
>   generic
>     type T is private;
>     ...
>   package G is
>     when T'Definite =>
>       ...;
>     when others =>
>       ...;
>   end;
> (On the subject of conditional compilation, see also the recent Ada
> Preprocessor thread on CLA.)

This looks like too large of a change for the benefit, but there
may be a simpler change that would work.  For example, by extending
the syntax for renames to allow a conditional expression, as in:

    generic package p1 is
    end p1;

    generic package p2 is
    end p2;

    with p1, p2;
    generic package p3 renames (if condition then p1 else p2);

****************************************************************

From: Alexandre E. Kopilovitch
Sent: Friday, February 13, 2004  9:38 PM

> Because you can't create a container (a map, say) of access types in this
> model. Remember, an interface has no implementation, so at some point you
> have to have a concrete implementation.

I remember that, but I still can't get how it may be possible that

1) we can create a container of interfaces and
2) we can create a container of accesses and
3) we have accesses to interfaces

but at the same time we cannot create a container of accesses to interfaces.

I don't understand how the delayed implementation of interfaces may create
this situation. Let me follow your example:

> Let me try to give a very simple example:
>
> (* Warning *) This is not a serious proposal! (* End Warning *)
>
>   package Ada.Containers is
>       type Element_Interface is interface;

Let's change the above line to:

        type Item_Interface is interface;
        type Element_Access is access all Item_Interface;

>       -- Any element operations here (I don't think there need to be any).
>
>       type Cursor_Interface is interface;
>       -- Any common cursor operations here.
>   end Ada.Containers;
>
>   package Ada.Containers.Interfaces is
>       type Forward_Iterator_Container_Interface is interface;
>       function Null_Cursor (Container : Forward_Iterator_Container_Interface) return
>            Cursor_Interface'Class is abstract;
>       function Front (Container : Forward_Iterator_Container_Interface) return
>            Cursor_Interface'Class is abstract;
>       procedure Increment (Container : Forward_Iterator_Container_Interface;
>            Cursor : in out Cursor_Interface'Class) is abstract;
>       function Element (Container : Forward_Iterator_Container_Interface;
>            Cursor: Cursor_Interface'Class) return Element_Interface'Class is abstract;

and the above function to:

        function Element (Container : Forward_Iterator_Container_Interface;
            Cursor: Cursor_Interface'Class) return Element_Access is abstract;
        function Item (Container : Forward_Iterator_Container_Interface;
            Cursor: Cursor_Interface'Class) return Item_Interface'Class is abstract;

>       ...
>       -- (It might make more sense to put the "iterator" operations on the Cursor_Interface.
>       -- But then you'd need a separate interface just for element access through a cursor.)
>   end Ada.Containers.Interfaces;
>
>   with Ada.Containers.Interfaces;
>   generic
>        type Key_Type is private;
>        type Element_Type is new Element_Interface;

change above line to

         type Item_Type is new Item_Interface;
         type Element_Type is access all Item_Type;

>        ... -- As before
>   package Ada.Containers.Maps is
>        type Map_Type is new
>           Ada.Containers.Interfaces.Forward_Iterator_Container_Interface with private;
>           -- Of course, other useful interfaces also would be included here. Probably
>           -- including a "map" one.
>        type Cursor_Type is new Cursor_Interface with private;
>
>        ... -- As before. (With appropriate Null_Cursor and Increment routines).
>   end Ada.Containers.Maps;
>
> Now, to use this, the element type has to 'have' the Element_Interface interface:
>     type My_Element_Type is new Ada.Containers.Element_Interface ... with
> ...;

correspondily:

  Now, to use this, the element type must be access to a type that has to 'have' the
  Item_Interface interface:
     type My_Item_Type is new Ada.Containers.Item_Interface ... with ...;
     type My_Element_Type is access all My_Item_Type;

> You can't instantiate the container with a scalar type or an access type or
> an array type or
> any record that doesn't have the Element_Interface interface.

But now, with the above changes we can instantiate the containter with access
to a tagged type that has Item_Interface interface.

Where I am wrong here - in which point/step?

****************************************************************

From: Randy Brukardt
Sent: Friday, February 13, 2004  9:47 PM

...
> correspondily:
>
>   Now, to use this, the element type must be access to a type that has to 'have' the
>   Item_Interface interface:
>      type My_Item_Type is new Ada.Containers.Item_Interface ... with ...;
>      type My_Element_Type is access all My_Item_Type;
>
> > You can't instantiate the container with a scalar type or an access type or
> > an array type or any record that doesn't have the Element_Interface interface.
>
> But now, with the above changes we can instantiate the containter with access
> to a tagged type that has Item_Interface interface.
>
> Where I am wrong here - in which point/step?

This works, of course, but now you can only instantiate with
access-to-interfaces. That's even more limiting than just interfaces -
because you have to do all of the memory management yourself. If you've been
following along here, I'm sure you've noticed that that won't do.

You could of course support this as an alternative implementation with both
sets of stuff around. But then you've instantly doubled the size of the
library -- and you still can't have a container of floats or of arrays
(especially of unconstrained arrays). Wrappers are very space-inefficient in
the first case, and barely possible for unconstrained arrays (the code to
use them will be very ugly).

****************************************************************

From: Matthew Heaney
Sent: Friday, February 13, 2004  9:08 AM

The current version of the reference implementation has examples of
indefinite sets, maps, and vectors.

<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040213.zip>

However, I have discovered a potential anomaly in indefinite containers
that I wanted to make users aware of.

An indefinite container is implemented by storing a pointer to the
(indefinite) element, and doing the allocation and deallocation of the
element behind the scenes during insertion and deletion.

The issue comes up in the item-less forms of insertion.  In that case,
there is a null pointer for the element.  This has several consequences.

Consider the vector.  When we do an item-less insert, does that mean we
copy the internal pointer up to the next position, and leave a null
pointer at the insertion position?  Or do we leave the original element
there and make a copy of the element to slide up?

When we delete vector elements, do we move pointers down, and leave null
element pointers behind?  Or are we required to make a copy to slide down?

What should the passive iterator do when it hits a null element pointer?
Skip that position or just raise Constraint_Error?

Should we generalize Replace_Element, to allow a "null element" as the
replacement value?

Does Generic_Element return a null pointer if the element pointer is
null, or does it raise CE?

What should sort do with null elements?  Assume that a null element is
always less than a non-null element?

This affects streaming of elements too, because you have to stream out
an extra bit to indicate whether the element is null or not.

My tentative assumption is that we'll have to omit the item-less
insertion operations in the indefinite containers.  This mostly applies
only to vector and map, but I still have to analyze the behavior of
indefinite set Generic_Keys nested package.

The reference implementation doesn't do anything special for indefinite
vectors.  The indefinite map handles null elements.  I will fix both
this weekend, as I prepare an errata list for Randy.

If the developers who want indefinite containers have an opinion about
these matters than please speak up.

****************************************************************

From: Alexandre E. Kopilovitch
Sent: Friday, February 13, 2004  10:19 AM

Matthew Heaney wrote:

> I have discovered a potential anomaly in indefinite containers
> that I wanted to make users aware of.
>
> An indefinite container is implemented by storing a pointer to the
> (indefinite) element, and doing the allocation and deallocation of the
> element behind the scenes during insertion and deletion.
>
> The issue comes up in the item-less forms of insertion.  In that case,
> there is a null pointer for the element.  This has several consequences.
>
>...
>
> Should we generalize Replace_Element, to allow a "null element" as the
> replacement value?

No.

>...
> If the developers who want indefinite containers have an opinion about
> these matters than please speak up.

Yes, I think that this is right way to go - exclude item-less elements from
containers for indefintite types altogether. This is clear and well-justified
restriction, and it will not harm usefulness of those containers significantly.
>From user's viewpoint I believe that this restriction is fair price for
admission of indefinite type in containers (with basic memory management),
at least in a basic library.

****************************************************************

From: Marius Amado Alves
Sent: Friday, February 13, 2004  10:45 AM

First what is item-less insertion? Checking the AI I guess it is this:

<<
procedure Insert_N
  (Vector : in out Vector_Type;
   Before : in     Index_Type'Base;
   Count  : in     Natural);

Equivalent to Insert_N (Vector, Before, Count, New_Item), with the
difference that the elements in the Count positions starting at Before
are not assigned.
>>

That is (correct me if I'm wrong), the inserted elements hold garbage.
Garbage is garbage, definite or indefinite.
Any attempt to read garbage should raise an exception (I'm checking now if the
AI has this provision; it should).

Sorry, I'm not following strictly your questions, but I think I'm answering
them.

Another 'problem' is what a proper multiple insertion (Insert_N/4 with Count >
1) does for indefinite elements: repeat the same pointer or create N copies
of the item? Value semantics, man. Create N copies.

****************************************************************

From: Marius Amado Alves
Sent: Friday, February 13, 2004  11:18 AM

Matt,

I'm rechecking you questions one by one now, against the 'philosophy'
expressed in my previous post.

...
> Consider the vector.  When we do an item-less insert, does that mean we
> copy the internal pointer up to the next position,

yes

> and leave a null
> pointer at the insertion position?

A null or another internal sign of garbage.

>  Or do we leave the original element
> there and make a copy of the element to slide up?

This does not make sense. The user is inserting garbage.

> When we delete vector elements, do we move pointers down, and leave null
> element pointers behind?  Or are we required to make a copy to slide down?

Move pointers. And leave *nothing* behind. Shrink the vector, as per the spec.

> What should the passive iterator do when it hits a null element pointer?

Whatever it does when it hits an unassigned (garbage) element. Definite or
indefinite.

> Skip that position or just raise Constraint_Error?

Definitely raise something. But in definite elements too. And maybe a more
specific exception. Value_Error. Data_Error.

> Should we generalize Replace_Element, to allow a "null element" as the
> replacement value?

I'm not sure I understand. Please do not create another special entity.
Definitely not Null_Element, which the user would have to define.

> Does Generic_Element return a null pointer if the element pointer is
> null, or does it raise CE?

See above.

> What should sort do with null elements?  Assume that a null element is
> always less than a non-null element?

I say raise something.

> This affects streaming of elements too, because you have to stream out
> an extra bit to indicate whether the element is null or not.

Again, raise.

That is, in practice, forbid sort or (container-wide) streaming of a container
with (yet) unassigned elements.

Let the user create its one 'null' element value, if he needs to process it.

> My tentative assumption is that we'll have to omit the item-less
> insertion operations in the indefinite containers.

No. Or, omiting it, omit in definite too.

> This mostly applies
> only to vector and map, but I still have to analyze the behavior of
> indefinite set Generic_Keys nested package.

With the philosophy subsumed in my replies, that analysis should be clear ;-)

Please note my solution implies containers have a 'validity' state. Namely, if
they contain unassigned elements they are invalid w.r.t. some operations e.g.
sort. Maybe a Valid predicate should be added to the spec. Alternatively, we
can simply remove the creation of unassigned elements i.e. omit the item-less
insertion.

> The reference implementation doesn't do anything special for indefinite
> vectors.  The indefinite map handles null elements.  I will fix both
> this weekend, as I prepare an errata list for Randy.
>
> If the developers who want indefinite containers have an opinion about
> these matters than please speak up.

****************************************************************

From: Stephen Leake
Sent: Friday, February 13, 2004  11:48 AM

Matthew Heaney <mheaney@on2.com> writes:

> An indefinite container is implemented by storing a pointer to the
> (indefinite) element, and doing the allocation and deallocation of the
> element behind the scenes during insertion and deletion.

ok, good.

> The issue comes up in the item-less forms of insertion. In that
> case, there is a null pointer for the element.

Why would I want to do this? Seems bogus to me. Just remove this
operation, all the problems go away!

I had not noticed these versions of Insert before. Do you have an
example of when they are useful?

Note that for definite Item_Type, you can still get Constraint_Error
from an itemless Insert, unless the element is initialized to some
valid value.

> Consider the vector.  When we do an item-less insert, does that mean
> we copy the internal pointer up to the next position, and leave a null
> pointer at the insertion position?

Yes.

> Or do we leave the original element there and make a copy of the
> element to slide up?

Why should the null pointer case be any different than the non-null case?

> When we delete vector elements, do we move pointers down, and leave
> null element pointers behind? Or are we required to make a copy to
> slide down?

I guess you mean what do you leave in vector (last + 1). I would move
pointers down, and leave a null pointer (again, this is the same
whether we have null inserts or not).

> What should the passive iterator do when it hits a null element
> pointer? Skip that position or just raise Constraint_Error?

Raise Constraint_Error. The user asked for it.

> Should we generalize Replace_Element, to allow a "null element" as
> the replacement value?

no. Unless you have an example of when that would be useful.

> Does Generic_Element return a null pointer if the element pointer is
> null, or does it raise CE?

Raise Constraint_Error.

It might be nice to have a version of Generic_Element that returns the
pointer, rather than the element. As Maps.Generic_Element does.

> What should sort do with null elements?  Assume that a null element is
> always less than a non-null element?

Raise Constraint_Error.

> This affects streaming of elements too, because you have to stream
> out an extra bit to indicate whether the element is null or not.

Raise Constraint_Error.

> My tentative assumption is that we'll have to omit the item-less
> insertion operations in the indefinite containers. This mostly
> applies only to vector and map, but I still have to analyze the
> behavior of indefinite set Generic_Keys nested package.

Ok by me.

> The reference implementation doesn't do anything special for
> indefinite vectors. The indefinite map handles null elements. I will
> fix both this weekend, as I prepare an errata list for Randy.
>
> If the developers who want indefinite containers have an opinion about
> these matters than please speak up.

I have :).

****************************************************************

From: Matthew Heaney
Sent: Friday, February 13, 2004  5:33 PM

>>The issue comes up in the item-less forms of insertion. In that
>>case, there is a null pointer for the element.
>
> Why would I want to do this? Seems bogus to me. Just remove this
> operation, all the problems go away!

That's what I'll do.

> I had not noticed these versions of Insert before. Do you have an
> example of when they are useful?

Because you don't always have a value to assign immediately.  What you
want to do is make space in the vector for all the items, and then do
the assignment.  For example, suppose you want to copy a list into a vector:

   V : Vector_Type;

procedure Copy (List : List_Type; I : Index_Type) is
    C : Cursor_Type := First (List);
    J : Index_Type := I;
begin
    Insert_N (V, Before => I, Count => Length (List));

    for K in 1 .. Length (List) loop
       Replace_Element (V, Index => J, By => Element (List));
       Increment (Cursor);
       J := Index_Type'Succ (J);
    end loop;
end Copy;

If you don't do it this way, then your time complexity is O(n*m) instead
of O(n+m).

>>My tentative assumption is that we'll have to omit the item-less
>>insertion operations in the indefinite containers. This mostly
>>applies only to vector and map, but I still have to analyze the
>>behavior of indefinite set Generic_Keys nested package.
>
> Ok by me.

This simplifies the model.  Let's do it this way.

****************************************************************

From: Randy Brukardt
Sent: Friday, February 13, 2004  10:54 PM

Matt Heaney wrote:

> An indefinite container is implemented by storing a pointer to the
> (indefinite) element, and doing the allocation and deallocation of the
> element behind the scenes during insertion and deletion.
>
> The issue comes up in the item-less forms of insertion.  In that case,
> there is a null pointer for the element.  This has several consequences.

Well, you have to decide precisely what containers you are creating. (That's
the designers job, I think).

Consider the Sequence.
(Aside: I don't think the name "Vector" is going to make it, given that
AI-296 has about 10 years dibs on that name. And I don't think we want two
different things with the same name in the standard...)

If your container supports sparse sequences, then you need to decide what it
means to not have an element at a position. And whatever that decision is,
it probably ought to be the same for both forms. I tend to agree that
referencing an empty element should cause an exception in that case (it's
better than returning garbage). (Which means that Sorting and [passive]
Iteration would raise that exception when the first empty element was
reached.)

OTOH, if your container does not support sparse sequences, then I don't see
why you ought to have item-less forms of insertion in the first place.
Inserting nothing is a mistake if you can't have undefined elements.

In either case, it is clear that deletion should shrink the (virtual) length
of the sequence. To do anything else would mean that you couldn't reliably
iterate on a sequence that has ever been deleted from. That seems goofy. Of
course, that doesn't mean that you need to change the length of the internal
array. And doing so means that it is irrelevant how items past the logical
end of the array are represented.

I do think that if you support sparse sequences, you need to be able to
stream them in and out. They seem to be potentially useful (imagine a
histogram vector; values that never occurred would not need any value at
all), and if they are legitimate at all, they have to be streamable. Of
course, if you don't support sparse sequences and you get one anyway, that's
a bug. Crashing is fine. :-)

I know that at least some readers have thought that sparse sequences are
supported. So a definitive decision on that is needed.

****************************************************************

From: Robert A. Duff
Sent: Saturday, February 14, 2004  9:56 AM

> (Aside: I don't think the name "Vector" is going to make it, given that
> AI-296 has about 10 years dibs on that name. And I don't think we want two
> different things with the same name in the standard...)

I don't really agree.  They are widely-separated enough that confusion
can be avoided.

We already have "dispatching", which means an indirect call when you're
talking about tagged types, but means choosing which task to run when
you're talking about tasks.  "Pragma Controlled" and
"Finalization.Controlled" are totally unrelated.  A "stub" in the DS
Annex has something to do with inter-process communication; a "stub" in
the core language is a syntactic placeholder for a body.  Probably
more...

So there's precedent for using confusing terminology when
convenient.  ;-)

"Vector" is good because it matches what other languages call the thing,
and it's short, unlike "Growable_Array" and the like.

[snipped stuff I agree with]

> I know that at least some readers have thought that sparse sequences are
> supported. So a definitive decision on that is needed.

Yes, this is another case where I think the programmer needs to know
(via impl advice or whatever) what's going on under the hood.

****************************************************************

From: Nick Roberts
Sent: Saturday, February 14, 2004  3:51 PM

Randy Brukardt wrote:

>>Since an indefinite Key_Type is required for
>>Containers.Maps.Strings, why not make that capability available to the
>>users?
>
> We definitely expect that the strings container will use a purpose-built
> data structure for storing strings, not some general indefinite item
> capability. Ways to compactly and efficiently store sets of varying size
> strings are well known and commonly used.
>
> Such algorithms could be extended to a general "unconstrained array of
> elementary", but that hardly seems to be a worthwhile definition for keys.

The key value of each element stored in a map (implemented as a hashed
array) must also be stored. Since the Element_Type is definite, making the
Key_Type definite as well makes it possible for the key values (as well as
the element values) to be stored in a fixed array.

This has the advantage of making the implementation simpler, but the
disadvantage of not supporting indefinite key types (which I reckon would
be useful in a significant minority of cases).

Simplifying the implementation has two benefits: implementation costs are
reduced and the risk of failure (bugs) reduced; executional efficiency
(speed more than memory use in this situation) is likely to be increased.

I understand Randy is arguing that executional efficiency should be
considered of relatively low importance for these containers, and I agree.

On the other hand, implementation simplification is, I suspect, going to be
considered quite important by the ARG (and WG9?).

I would, on balance, prefer an indefinite key type, but I've set out the
reasons why a definite key type would be preferred, and I would guess these
reasons would prevail.

>>Another point: Containers.Vectors.Size should return Index_Type'Base,
>>and the Size parameter in Resize should also be Index_Type'Base. It's
>>confusing to have different types for Size and Index.
>>
>>There's also a problem if Natural'Last < Index_Type'Last; you
>>can't have a vector that contains every index!
> ...
> So I don't see a great solution. I wondered about using "Hash_Type" here (it
> has the correct properties), but that seems like a misuse of the type (and a
> bad idea in a library that most Ada programmers will read - you want to show
> them good style in standard libraries).

My preferred solution would be to remove the Index_Type generic parameter
altogether, and make the index type Standard.Positive. I believe this would
have the advantage of simplifying the package from the user's point of
view, it would solve at a stroke the problems mentioned above, and I
believe that no-one in practice will ever need to use a different index type.

****************************************************************

From: Robert A. Duff
Sent: Sunday, February 15, 2004  11:57 AM

I disagree.  Using different index types for different kinds of arrays
is a very useful way to catch bugs, even when all those index types are
basically just 1..2**31-1.  This is true for the normal built-in array
types, and also for growable ones (Vectors).

I have a growable-array generic in my current project that is
instantiated dozens of times, and it has a "range <>" parameter for the
index type.  Some instantiations share the same index type, but most
have their own, and I think that's a Good Thing.

Furthermore, using Positive doesn't solve Randy's problem -- he's got a
compiler where Positive'Last = 2**15-1, but the machine has a 32-bit
address space, so you very well might want Vectors longer than
Positive'Last.

Furthermore, if the Index_Type is "range <>" (which I think it should
be), then the Size can reasonably be of a subtype declared like this:

    subtype Size_Type is Index_Type'Base range 1..Index_Type'Base'Last;

As I said before, allowing Index_Type to be modular or enumeration is
not useful, and introduces anomalies.

****************************************************************

From: Matthew Heaney
Sent: Sunday, February 15, 2004  1:08 PM

Bob Duff wrote:

> Furthermore, if the Index_Type is "range <>" (which I think it should
> be), then the Size can reasonably be of a subtype declared like this:
>
>     subtype Size_Type is Index_Type'Base range 1..Index_Type'Base'Last;

Bob you have my latest API for the vector container.  What I did as a
replacement for Natural is this:

   type Element_Count is 0 .. <implementation-defined>;

There's also a Positive_Element_Count subtype.

I don't know if this is the way you want to go but at least it's a start.

I like your idea above, too.  One issue is that the T'Last of the
size/length type (Size_Type'Last in your example) needs to be at least the
value of Index_Type'Last - Index_Type'First + 1.

I'm not sure your scheme above will work since Index_Type'Base might not
have all those values.  Consider using subtype Natural as the generic
actual index type, which means you have one too many values to represent.

There's always going to be some type that's too big.  Suppose I
instantiate the vector with Long_Long_Integer?  In that case I don't have
any integer type that can fit the number of values that are theoretically
possible.

I don't think there's any real issue for generic actual index types with a
large range, since you're not going to put that many elements in the
vector container anyway.  The problem cases are when you use a type with a
smaller range, e.g.

   type My_Index_Type is range -128 .. 127;

The number of possible container elements is 256, but T'Base'Last might
only be 127 (indeed that's all it's required to be).

Of course we could require that users declare their type to have the
required properties:

   Last : constant := 127;
   First : constant := -128;
   N : constant := Last - First + 1;

   type My_Index_Type_Base is First .. N;

   type My_Index_Type is new My_Index_Type
      range First .. Last;

But this is probably too subtle for typical language users.

In the new reference implementation I sent you I use Syste.Max_Int to
declare the Element_Count type, which means casual users would only have
an issue for a generic actual index type such as Long_Long_Integer (whose
use as an index type I would expect to be rare).


> As I said before, allowing Index_Type to be modular or enumeration is
> not useful, and introduces anomalies.

The generic formal index type was also changed as you suggested, to use
the stronger form "range <>" instead of the weaker form "(<>)".

****************************************************************

From: Ehud Lamm
Sent: Sunday, February 15, 2004  4:34 AM

> Ehud Lamm wrote:
> > But signature packages would work ok, wouldn't they?
>
> Signature packages violate the meta-rule about ease of
> instantiation: as few
> instantiations as possible to get a usable container. (That's one
> instantiation, of course.) As far as I can tell, to use them like
> interfaces, they'd have to be a parameter to the generic
> container package.
>
> But perhaps you had something else in mind.
>

I agree with the rationale behind meta-rule: simple things should be simple.

The signatures will not be required in order to use the containers. They
will only be required once you try to write code that should work across
containers AND across libraries.
Since this isn't going to be the most common scenario this probably falls
outside the 80/20 guideline, so I'll leave it at that.

Personally, I like signature packages and interface-oriented programming,
and I would have liked the library to encourage this style even more than it
currently does. But what's now on the table is still a big step forward.

****************************************************************

From: Jeffrey Carter
Sent: Sunday, February 15, 2004  9:29 PM

Randy Brukardt wrote:

> Moreover, the component you're describing (a hash table without elements)
> wouldn't have any place to *put* elements. So I don't see how you could even
> use it to implement the map. (The hash table component you're suggesting
> would return a Cursor object to represent each key, but that item isn't an
> index that you could use in a sequence. So how would you associate a key
> from the hash table with an element? A linear list would work, but would
> essentially make the hash table useless.)

Apparently I'm not making myself clear. Consider:

generic -- Hash_Tables
    type Element is private;
    with function "=" (Left, Right : Element) return Boolean is <>;
    with function Hash (Item : Element) return Hash_Value is <>;
package Hash_Tables is
    type Hash_Table is private;

    procedure Insert (Into : in out Hash_Table; Item : in Element);
    -- Inserts Item into Into. If Into contains an Element X such that
    -- Item = X, replaces X with Item.

    procedure Delete (From : in out Hash_Table; Item : in Element);
    -- If From contains an Element X such that Item = X, deletes X
    -- from From. Otherwise, has no effect.

    function Is_In (Item : Element; Table : Hash_Table) return Boolean;
    -- If Table contains an Element X such that Item = X, returns True;
    -- Otherwise, returns False

    function Get (Item : Element; From : Hash_Table) return Element;
    -- If From contains an Element X such that Item = X, returns X.
    -- Otherwise, raise Constraint_Error.
private -- Hash_Tables
    ...
end Hash_Tables;

generic -- Hashed_Maps
    type Key_Info is private;
    type Element is private;
    with function "=" (Left, Right : Key_Info) return Boolean is <>;
    with function Hash (Item : Key_Info) return Hash_Value is <>;
package Hashed_Maps is
    type Hashed_Map is private;

    procedure Insert (Into : in out Hashed_Map; Key : in Key_Info;
       Item : in Element);
    -- Inserts Key/Item into Into. If Into contains a key X such that
    -- Key = X, replaces the Element associated with X with Item.

    procedure Delete (From : in out Hashed_Map; Key : in Key_Info);
    -- If From contains a key X such that Key = X, deletes X and the
    -- Element associated with it from From. Otherwise, has no effect.

    procedure Is_In (Key : Key_Info; Map : Hashed_Map) return Boolean;
    -- If Map contains a key X such that Key = X, returns True.
    -- Otherwise, returns False.

    procedure Get (Key : Key_Info; Map : Hashed_Map) return Element;
    -- If Map contains a key X such that Key = X, returns the Element
    -- associated with X. Otherwise, raises Constraint_Error.
private -- Hashed_Maps
    type Hash_Node is record
       Key  : Key_Info;
       Item : Element;
    end record;

    function "=" (Left, Right : Hash_Node) return Boolean;
    -- Performs Left.Key = Right.Key.

    function Hash (Item : Hash_Node) return Hash_Value;
    -- Performs Hash (Item.Key).

    package Implementation is new Hash_Tables (Element => Hash_Node);

    type Hashed_Map is record
       Table : Implementation.Hash_Table;
    end record;
end Hashed_Maps;

Insert, Delete, and Is_In should be obvious. Get would be implemented as

    Dummy : Element;
begin -- Get
    Dummy.Key := Key;
    return Implementation.Get (Dummy, Map.Table).Item;

Obviously a lot of functionality is missing from this simple example,
but it clearly demonstrates how a hash table can be used to implement a
map, while leaving the hash table available for those who are not
storing key/value pairs.

Yes, I know these won't compile :)

****************************************************************

From: Randy Brukardt
Sent: Monday, February 16, 2004 10:19 PM

> Apparently I'm not making myself clear. Consider:

Definitely. :-)

...
> Obviously a lot of functionality is missing from this simple example,
> but it clearly demonstrates how a hash table can be used to implement a
> map, while leaving the hash table available for those who are not
> storing key/value pairs.

OK, what you're calling a Hash Table is what Matt called a Hashed Set. To
me, a hash table is an index without any elements at all - it's used as part
of the implementation of some larger component.

In any case, as I said earlier, that implementation (which is very similar
to Matt's) would be horrible on our compiler. You'd end up with 3 separate
allocations per element, plus a bunch of call overhead. Other compilers
mileage may vary (although I'd expect most would generate better code
without the extra generic).

So, you cannot assume that there is "no extra cost" here; it would be
another entire component. It would, of course, be very similar to the
"Sorted_Set" component, so it's hard to see that there is enough value to
having a separate container for The Standard, but I'd expect it to appear in
the secondary standard (along with List and Sorted_Map).

****************************************************************

From: Nick Roberts
Sent: Monday, February 16, 2004  5:47 PM

Robert A Duff wrote:

> Nick Roberts wrote:
>
>> My preferred solution would be to remove the Index_Type generic
>> parameter altogether, and make the index type Standard.Positive. I
>> believe this would have the advantage of simplifying the package from
>> the user's point of view, it would solve at a stroke the problems
>> mentioned above, and I believe that no-one in practice will ever need
>> to use a different index type.
>
> I disagree.  Using different index types for different kinds of arrays
> is a very useful way to catch bugs, even when all those index types are
>  basically just 1..2**31-1.  This is true for the normal built-in array
>  types, and also for growable ones (Vectors).

I think you are fundamentally wrong on this point, Bob. And I mean
'fundamentally', as I am looking at it from a very purist point of view
(perhaps too purist, I'm not sure). I'll try to explain.

I think arrays (in Ada and similar languages) are used for two
fundamentally different purposes: (a) as a mapping, from the index subtype
to the element subtype; (b) as a sequence of elements.

What marks out the difference between (a) and (b) is that for a sequence,
it is the order of the elements that is of primary importance. A good
example of usage (a) is the array type Schedule in RM95 3.6 (28), which
maps from Day to Boolean. A good example of usage (b) is a String.

In usage (b), the index type is merely used to indicate the relative
positions of the elements of the sequence, and it has long become common
and programming (at least in Ada!) convention to call the first element
number 1, the second number 2, and so on. In mathematics, the set N of
natural (not Natural in the Ada sense!) numbers {1, 2, 3, ...} is almost
always used for this purpose. In Ada, the subtype Positive is almost always
used (it is used for String), and I think it makes logical sense to use the
same subtype for this single purpose.

I believe that, in practice, an extensible array will only ever have usage
(b). Therefore, logically, I think the index type should always be Positive.

I think this argument is reinforced by the tangle that using a generic
Index_Type has obviously got you into. If you simply use Positive, the
problems all go away. Isn't that a bit of a hint?

> I have a growable-array generic in my current project that is
> instantiated dozens of times, and it has a "range <>" parameter for the
>  index type.  Some instantiations share the same index type, but most
> have their own, and I think that's a Good Thing.

Then ask yourself the question: how difficult would it be to remove the
"range <>" parameter and use Positive instead throughout? I suspect you
would find this quite easy to do, and that the result would be easier to
read and understand.

> Furthermore, using Positive doesn't solve Randy's problem -- he's got a
>  compiler where Positive'Last = 2**15-1, but the machine has a 32-bit
> address space, so you very well might want Vectors longer than
> Positive'Last.

I doubt that very much (that you very well might want Vectors longer than
Positive'Last). Presumably this decision was made having being satisfied
that users would not want any String to be longer than 2**15-1 characters.

Surely it would be silly to expect users to be happy with this constraint
on strings, but rebel against it applying to extensible arrays? Surely, if
users of this implementation really required bigger extensible arrays, they
would almost certainly also demand bigger strings, in which case the right
solution would be to make Integer 32-bit based?

> Furthermore, if the Index_Type is "range <>" (which I think it should
> be), then the Size can reasonably be of a subtype declared like this:
>
> subtype Size_Type is Index_Type'Base range 1..Index_Type'Base'Last;

This might be considered a reasonable solution, but it could go wrong. If
Index_Type'First < 1, it might be possible for an extensible array to reach
a length greater than Index_Type'Base'Last. [I think the term 'length' is
more appropriate than 'size'.]

This solution imposes another subtype (or maybe type) upon the user; one
for each instantiation of the extensible array package, in effect. A user
would be annoyed when, for example, trying to compare the length of one
extensible array to that of another (with a different Index_Type'Base), to
find the compiler complaining:

    type Apple_Count is range 0..100; -- maximum of 100 apples
    type Orange_Count is range 0..2000; -- maximum of 20000 oranges

    subtype Apple_Index is Apple_Count range 1..Apple_Count'Last;
    subtype Orange_Index is Orange_Count range 1..Orange_Count'Last;

    package Apple_Baskets is
       new Ada.Containers.Vectors(Apple_Index,Apple);
    package Orange_Baskets is
       new Ada.Containers.Vectors(Orange_Index,Orange);

    Apple_Basket: Apple_Baskets.Vector_Type;
    Orange_Basket: Orange_Baskets.Vector_Type;

    ...

       if Size(Apple_Basket) < Size(Orange_Basket) then

This comparison might not work on some implementations. Worse, it might
work on other implementations, and the user could be pretty mystified as to
why.

       if Size(Apple_Basket)
             < Apple_Baskets.Size_Type(Size(Orange_Basket)) then

seems ugly, and could raise Constraint_Error, and

       if Natural(Size(Apple_Basket)) < Natural(Size(Orange_Basket)) then

seems to defeat the purpose (of not simply using Positive as the index
type), and could also raise Constraint_Error (with Randy's compiler, for
example).

I think my way is simpler and better: instantiations of the package do not
require an Index_Type; there is no need for a separate size/length
(sub)type. It is easier to understand and there is less to go wrong.

As an aside, I would reiterate that I think the name 'Size' for the
ordinality function is confusing, and ought to be 'Length', to accord with
the meaning of the Length attribute.

> As I said before, allowing Index_Type to be modular or enumeration is
> not useful, and introduces anomalies.

And I think replacing Index_Type with Positive would reduce the anomalies
still further.

****************************************************************

From: Randy Brukardt
Sent: Monday, February 16, 2004 10:56 PM

> I believe that, in practice, an extensible array will only ever have usage
> (b). Therefore, logically, I think the index type should always
> be Positive.

That's only true if we're not supporting sparse sequences. (And perhaps not
even then.) I disagree with Bob and Matt that modular indexes aren't useful,
and can even imagine uses for enumeration index types (although that would
be rare enough not to worry about).

> I think this argument is reinforced by the tangle that using a generic
> Index_Type has obviously got you into. If you simply use Positive, the
> problems all go away. Isn't that a bit of a hint?

Yeah, and if we got rid of the generic and just made the elements void *
we'd have less problems still. :-)

Seriously, Ada is about strong typing, and you're suggesting to deny the
programmer the power of strong typing in this package. That's a non-starter
in my view.

...
> > Furthermore, using Positive doesn't solve Randy's problem -- he's got a
> >  compiler where Positive'Last = 2**15-1, but the machine has a 32-bit
> > address space, so you very well might want Vectors longer than
> > Positive'Last.
>
> I doubt that very much (that you very well might want Vectors longer than
> Positive'Last). Presumably this decision was made having being satisfied
> that users would not want any String to be longer than 2**15-1 characters.

That's a complete fallacy. The reason this decision was made (in 1987!) was
that we wanted to be able to migrate users from our 16-bit MS-DOS compilers
to our 32-bit compilers with as little incompatibility as possible. The
intent was that if a program was recompiled on a 32-bit compiler, it would
run and work, including being able to read and write files in the same
format.

> Surely it would be silly to expect users to be happy with this constraint
> on strings, but rebel against it applying to extensible arrays? Surely, if
> users of this implementation really required bigger extensible arrays,
they
> would almost certainly also demand bigger strings, in which case the right
> solution would be to make Integer 32-bit based?

If someone wants a 32-bit string, all they have to do is write:

    type Long_Natural is range 0 .. 2**31-1;
    subtype Long_Positive is Long_Natural range 1 .. Long_Positive'Last;
    type Long_String is array (Long_Positive range <>) of Character;

which works fine (except for the language-defined packages). Moreover, this
will work on essentially any Ada compiler (including our 16-bit MS-DOS
compilers) without any dependence on the definitions of predefined types.

OTOH, making Integer 32-bit would use more data memory (potentially a lot
more), and could make existing files unreadable. The amount of pain for a
programmer to change from 16-bit Integer to 32-bit Integer depends on the
code of course, but it can be worse than moving to another compiler
altogether. We don't want to be encouraging our customers to move to another
vendor!

The only real option would be to have a compiler switch of some sort to
select which is used, but that would require lots of work in the compiler -
everything assumes a single definition for Standard. (Yes, we've studied it
seriously, as the choice of 16-bit for Integer is a significant portability
issue - far too many people assume the range of that type, where if they
really care about the range, they should declare their own type.) There are
many other things of more value to our customers at this time.

No Ada program should depend on predefined elementary types. Period.
Unfortunately, type String drags in Natural, leaving no real chance to
enforce a decent Ada style (you can't easily tell when a use of Natural is
for indexing String, or when it is being abused). That's a bug in the Ada
design, but one we're going to have to live with.

> This solution imposes another subtype (or maybe type) upon the user; one
> for each instantiation of the extensible array package, in effect. A user
> would be annoyed when, for example, trying to compare the length of one
> extensible array to that of another (with a different Index_Type'Base), to
> find the compiler complaining:

I agree, but not with your solution. Clearly, there should be a Size_Type
next to Hash_Type in Ada.Containers. If you actually need to do math on it
(which should be very rare), you'd need a "use type
Ada.Containers.Size_Type;", but with any decent style, you'll need that no
matter what the type is or where it is declared. You don't want it in the
generic unit (for the reasons you stated), Natural is clearly bad (use
predefined scalar types only for String in new code - we want to show
readers of the standard good style), so a type is needed somewhere fairly
high up in the hierarchy.

****************************************************************

From: Matthew Heaney
Sent: Monday, February 16, 2004  11:54 PM

I got rid of the subtype Natural in the container packages, per Randy's
request.

I modified the proposal and the reference implementation so that each
generic package declares its own modular Element_Count type.  In the case
of the map it just derives from Hash_Type; in the vector and set it's its
own declaration.

My issue with Randy's solution is that the operators for the size type
aren't visible where the instantiation is visible, so you have to with
Ada.Containers specially.  (But is that really true?  I still have to
check that.)  By declaring the type right in the generic package, the user
has immediate access to the size type.

Perhaps it's not such a big deal to have to make a special with of
Ada.Containers.  I don't really know.  One advantage of Randy's solution
is that the packages can share the size type.  So for example you can pass
the result of the Length function of one container to the Resize operation
for some other container, and no type conversion is necessary.  On the
other hand, doing that across different container instantiations might be
rare.

So where the size type lives, what its name is, etc, is still very
tentative.  The next release will merely show one way to do it.

****************************************************************

From: Robert A. Duff
Sent: Tuesday, February 17, 2004  8:42 AM

...
> In usage (b), the index type is merely used to indicate the relative
> positions of the elements of the sequence, and it has long become common
> and programming (at least in Ada!) convention to call the first element
> number 1, the second number 2, and so on. In mathematics, the set N of
> natural (not Natural in the Ada sense!) numbers {1, 2, 3, ...} is almost
> always used for this purpose. In Ada, the subtype Positive is almost always
> used (it is used for String), and I think it makes logical sense to use the
> same subtype for this single purpose.

Positive is rarely used in well-written Ada code, except when using
String.  It was a language-design mistake to use Positive for String;
there should have been a separate String_Index type.  It was also a
language design mistake to put non-standard stuff like Integer and
Long_Integer in Standard.

> I believe that, in practice, an extensible array will only ever have usage
> (b). Therefore, logically, I think the index type should always be Positive.

I agree with the above philosophy (mappings vs sequences).  However, it
does not follow that sequences should always be indexed by Positive.
It should usually be indexed by a type whose range is 1..<something>.
There are good reasons why the programmer might want different upper
bounds.  There are also some cases where 0..<something> makes more
sense for a sequence.  Therefore, we should leave this choice to the
programmer.

Furthermore, it is important to allow the programmer to use different
index types for unrelated sequences, in order to prevent bugs.
For the same reason, when I declare a sequence-like array type,
I usually declare a new index type for it.  If two array types
are related so that I want to say things like:

    for I in ... loop
        ... A(I) ...
        ... B(I) ...

then I use the same index type for both.

> I think this argument is reinforced by the tangle that using a generic
> Index_Type has obviously got you into. If you simply use Positive, the
> problems all go away. Isn't that a bit of a hint?

My proposal has no "tangles" that I can see.  All the tangles are caused
by using modular or enumeration types for the index, which I don't
recommend.

> > I have a growable-array generic in my current project that is
> > instantiated dozens of times, and it has a "range <>" parameter for the
> >  index type.  Some instantiations share the same index type, but most
> > have their own, and I think that's a Good Thing.
>
> Then ask yourself the question: how difficult would it be to remove the
> "range <>" parameter and use Positive instead throughout? I suspect you
> would find this quite easy to do, and that the result would be easier to
> read and understand.

It would of course be trivial to remove that capability, but that's not
the issue.  It would damage the type checking, so I wouldn't do that.

By the way, my growable arrays generic says:

    pragma Assert(Index_Type'First = 1);

I did run into one case where that was inconvenient, and I wanted
sequences starting at 100_000_000, 200_000_000, etc.
I decided not to remove that assertion, though.

> > Furthermore, using Positive doesn't solve Randy's problem -- he's got a
> >  compiler where Positive'Last = 2**15-1, but the machine has a 32-bit
> > address space, so you very well might want Vectors longer than
> > Positive'Last.
>
> I doubt that very much (that you very well might want Vectors longer than
> Positive'Last). Presumably this decision was made having being satisfied
> that users would not want any String to be longer than 2**15-1 characters.
>
> Surely it would be silly to expect users to be happy with this constraint
> on strings, but rebel against it applying to extensible arrays? Surely, if
> users of this implementation really required bigger extensible arrays, they
> would almost certainly also demand bigger strings, in which case the right
> solution would be to make Integer 32-bit based?

Well, the machine in question is a 32-bit machine, so Integer really
*should* be 32 bits.  But Randy chose 16 bits for compatibility reasons,
which makes perfect sense.  Perhaps if Randy's customers had followed
good coding practise, he wouldn't have been forced into that decision.  ;-)

> > Furthermore, if the Index_Type is "range <>" (which I think it should
> > be), then the Size can reasonably be of a subtype declared like this:
> >
> > subtype Size_Type is Index_Type'Base range 1..Index_Type'Base'Last;
>
> This might be considered a reasonable solution, but it could go wrong. If
> Index_Type'First < 1, it might be possible for an extensible array to reach
> a length greater than Index_Type'Base'Last.

So don't do that.  You and I already agreed that Index_Type'First = 1,
usually.  Even if it's 0, you can't create a Vector that big, presuming
the upper bound is 2**31-1 on a 32-bit machine.

>... [I think the term 'length' is
> more appropriate than 'size'.]

I agree that size is not ideal.  But we're not talking about the
*current* length, we're talking about the maximum length we can grow
to without doing more allocation.  How about Buffer_Length, which
appropriately indicates that we're talking about the internal buffer.

> This solution imposes another subtype (or maybe type) upon the user; one
                                         ^^^^^^^^^^^^^

I said subtype, not type.  We're measuring number of components, here,
not bytes.  So it makes perfect sense to use the same type for indexing
as for this size measurement (but obviously a different subtype).

...
>     ...
>
>        if Size(Apple_Basket) < Size(Orange_Basket) then
>
> This comparison might not work on some implementations. Worse, it might
> work on other implementations, and the user could be pretty mystified as to
> why.

Heh?  First of all, given my proposal, the above comparison would be
illegal on *all* implementations.  That's what I want -- if
Apple_Baskets and Orange_Baskets are unrelated, then I *want* that
comparison to be illegal.  On the other hand, if the two abstractions
are related in such a way that indexes into one make sense for the
other, then the programmer should say so -- use the same index type for
both instantiations.  This should be the programmer's choice.

****************************************************************

From: Robert A. Duff
Sent: Tuesday, February 17, 2004  8:38 AM

> No Ada program should depend on predefined elementary types. Period.

So you don't use Boolean in your programs?  Maybe it's
"(False, Maybe, True)" on some implementations?  ;-)

Sorry, I couldn't resist -- I of course know what you meant.

****************************************************************

From: Robert A. Duff
Sent: Tuesday, February 17, 2004  8:53 AM

> I got rid of the subtype Natural in the container packages, per Randy's
> request.

Maybe you should wait for the whole ARG to come to a decision before you
make further changes in this area.

> I modified the proposal and the reference implementation so that each
> generic package declares its own modular Element_Count type.  In the case
> of the map it just derives from Hash_Type; in the vector and set it's its
> own declaration.

In the map and set, it should probably be a *signed* type: "type
Element_Count range 0..implementation-defined".  It's got nothing to do
with Hash_Type.

For Vector, it is related to the Index_Type, and should therefore be a
subtype of the same type:

    subtype Element_Count is Index_Type'Base range 0..Index_Type'Base'Last;

You might, for example, want to set the size to twice the current length
of the vector.  Both types are in the same "units", as it were -- number
of components, so they should be the same type.

(The above assumes that you agree with me that Index_Type should be
"range <>"; I know Randy, and perhaps others, don't agree with that.)

Furthermore, whether two different vectors should have the same
Index_Type and Element_Count type should be the programmer's choice.

Note that sets/maps are different from vectors -- in the former case,
the implementation controls the maximum size (it's related to available
memory), whereas in the vector case, the programmer controls the max
size by choosing the value of Index_Type'Last.

> My issue with Randy's solution is that the operators for the size type
> aren't visible where the instantiation is visible, so you have to with
> Ada.Containers specially.  (But is that really true?  I still have to
> check that.)

You don't need an extra with_clause, but you would need an extra
use_clause.  I agree that's slightly annoying.

>...By declaring the type right in the generic package, the user
> has immediate access to the size type.

But by making it a subtype of the type of Index_Type, all the operators
will be visible wherever the instance is visible.

****************************************************************

From: Matthew Heaney
Sent: Tuesday, February 17, 2004  9:39 AM

> Maybe you should wait for the whole ARG to come to a decision before you
> make further changes in this area.

OK.  Randy wanted an errata list early this week, and I wasn't sure
whether I was responsible for coming up with the version that didn't use
the Natural subtype.  It sounds like you guys already have some other ideas.

> In the map and set, it should probably be a *signed* type: "type
> Element_Count range 0..implementation-defined".  It's got nothing to do
> with Hash_Type.

OK.  That's the kind of feedback I was looking for.

I also wasn't sure whether you wanted signed or unsigned types as the
size/count/length type.  I guess I assumed you'd want unsigned, since
that gives you a bigger range.


...
> (The above assumes that you agree with me that Index_Type should be
> "range <>"; I know Randy, and perhaps others, don't agree with that.)

My tentative conclusion was to do as you suggested, and restrict the
vector to use only integer index types.  However, it appears that there
is still debate among the subcommittee, so I guess it's still an open issue.

The only problem with your scheme above is that Index_Type'Base doesn't
necessarily include all the values you need.  For example:

    type Index_Type is -10 .. 5;

Index_Type'Base'Last might only be 5, but we need it to be at least 16.

However, since this is supposed to be an expandable array, then maybe
the index type above doesn't make any sense.

Note that I'm not married to the name Element_Count; it was just an
idea.  I was using the container analog of type Storage_Count as the
model.  The name Size_Type might be better, which is the closer to the
style of name Hash_Type, and to the style of the actual container names.

> Furthermore, whether two different vectors should have the same
> Index_Type and Element_Count type should be the programmer's choice.
>
> Note that sets/maps are different from vectors -- in the former case,
> the implementation controls the maximum size (it's related to available
> memory), whereas in the vector case, the programmer controls the max
> size by choosing the value of Index_Type'Last.

OK.  I was assuming the model was the same for all containers (max
elements is controlled by available memory).

>>My issue with Randy's solution is that the operators for the size type
>>aren't visible where the instantiation is visible, so you have to with
>>Ada.Containers specially.  (But is that really true?  I still have to
>>check that.)
>
> You don't need an extra with_clause, but you would need an extra
> use_clause.  I agree that's slightly annoying.

I wasn't sure about that.  I was thinking that in order to say "use type
Ada.Containers.Size_Type", you had to with Ada.Containers too.  But it
sounds like I was wrong.

> But by making it a subtype of the type of Index_Type, all the operators
> will be visible wherever the instance is visible.

Yes.  I like using Index_Type'Base, but wasn't sure whether we would run
into snags wrt the base range of the type being large enough.  It sounds
like that's not really an issue.

****************************************************************

From: Robert A. Duff
Sent: Tuesday, February 17, 2004  10:20 AM

> I also wasn't sure whether you wanted signed or unsigned types as the
> size/count/length type.  I guess I assumed you'd want unsigned, since
> that gives you a bigger range.

This is why I hate modular types.  One is tempted to use them when
wraparound arithmetic is inappropriate, just to get one extra bit.
(IMHO, "type T is range 1..2**32-1;" should be legal on all
implementations -- for that matter, so should "range 1..10**100".
But I realize that's a pretty radical notion!)

Anyway, in this case, the extra bit probably isn't necessary.  You can't
create a vector of 2 billion integers on a 32-bit machine -- you'll run
out of address space first.  Even if the component type is Character,
you're unlikely to want to do that.  I believe many operating systems
steal half the address space for their own use, so no single process can
use more than 2 billion bytes anyway.  On a 64-bit machine, a vector of
2**62 components is unthinkable anytime soon.

As I said, "1..<something>" will be the most common index range, in
which case 'Length can't be more than 'Last..  If that's not enough, buy
a compiler that supports bigger signed integers.

I want overflow/constraint checking on that type.  So I suggest signed
integer rather than modular.

> The only problem with your scheme above is that Index_Type'Base doesn't
> necessarily include all the values you need.  For example:
>
>     type Index_Type is -10 .. 5;
>
> Index_Type'Base'Last might only be 5, but we need it to be at least 16.

Yes, it is possible to shoot yourself in the foot.  So don't do that.  ;-)

This is already an issue in Ada -- the programmer must take care to make
sure base ranges are wide enough.  Nothing new here.

> However, since this is supposed to be an expandable array, then maybe
> the index type above doesn't make any sense.

It would be rare, I'd say.

...
> OK.  I was assuming the model was the same for all containers (max
> elements is controlled by available memory).

Well, I suppose it *usually* will be -- the programmer will use an
Index_Type that goes up to the roughly size of the address space.  But
the programmer can choose a smaller Index_Type, and there are sometimes
good reasons to do so.

...
> I wasn't sure about that.  I was thinking that in order to say "use type
> Ada.Containers.Size_Type", you had to with Ada.Containers too.  But it
> sounds like I was wrong.

If you say "with A.B.C;", it causes all of A, A.B, and A.B.C to be
visible.  Look at the definition of "mentioned in a with_clause".
This is because compilers might have trouble dealing with holes in
the visibility -- cases where something is in scope, but the thing it's
declared inside of is not.

Use clauses don't work like that.

****************************************************************

From: Matthew Heaney
Sent: Tuesday, February 17, 2004  10:56 AM

...
> If you say "with A.B.C;", it causes all of A, A.B, and A.B.C to be
> visible.  Look at the definition of "mentioned in a with_clause".
> This is because compilers might have trouble dealing with holes in
> the visibility -- cases where something is in scope, but the thing it's
> declared inside of is not.
>
> Use clauses don't work like that.

I guess I'm still confused.  I just tried this:

with Character_Vectors;  use Character_Vectors;

procedure Test is
    use type Ada.Containers.Hash_Type;
begin
    null;
end Test;

but GNAT is telling me that I'm

"missing with for Ada.Containers"

I put a subtype declaration in the vectors package, like this:

subtype Hash_Type is Containers.Hash_Type;

and then I could say:

    use type Character_Vectors.Hash_Type;

But that's different from what Randy said to Nick:

 >I agree, but not with your solution. Clearly, there should
 >be a Size_Type next to Hash_Type in Ada.Containers. If you
 >actually need to do math on it (which should be very rare),
 >you'd need a "use type Ada.Containers.Size_Type;", but with
 >any decent style, you'll need that no matter what the type
 >is or where it is declared.

I didn't know how to get "use type Ada.Containers.Size_Type;" to work
without also with'ing Ada.Containers.  But perhaps Randy meant something
else?  I'm not sure.

If you want to declare

    type Size_Type is range 0 .. <implementation-defined>;

in Ada.Containers, I assumed you'd have to also declare a Size_Subtype
in Ada.Containers.Sorted_Sets and Ada.Containers.Maps, like this:

    subtype Size_Subtype is Size_Type;

and then the user would have to say:

with Instantiation;  use type Instantiation.Size_Subtype;

But that's different from saying "use type Ada.Containers.Size_Type;".

****************************************************************

From: Jeffrey Carter
Sent: Tuesday, February 17, 2004  11:35 AM

> OK, what you're calling a Hash Table is what Matt called a Hashed Set. To
> me, a hash table is an index without any elements at all - it's used as part
> of the implementation of some larger component.

We've already established that what Matt calls a "set" isn't.

I'm afraid you're not making yourself clear now. With rare exceptions,
hash functions can produce the same hash value for different elements.
This results in "collisions". Therefore, hash tables store the elements
so a lookup can determine if a specific element is actually in the
table, or just hashes to the same value as another element. Since an
element can contain information not used in calculating the hash or for
"=", it seems that a hash table has to have an interface something like
the one I presented.

In other words, without seeing something more specific (like a spec), I
can't tell how your idea of a hash table would work.

> In any case, as I said earlier, that implementation (which is very similar
> to Matt's) would be horrible on our compiler. You'd end up with 3 separate
> allocations per element, plus a bunch of call overhead. Other compilers
> mileage may vary (although I'd expect most would generate better code
> without the extra generic).

The solution is simple: don't use your compiler :)

For most applications that will be willing to use a standard component,
I doubt the performance will be unacceptable on any compiler.

> So, you cannot assume that there is "no extra cost" here; it would be
> another entire component. It would, of course, be very similar to the
> "Sorted_Set" component, so it's hard to see that there is enough value to
> having a separate container for The Standard, but I'd expect it to appear in
> the secondary standard (along with List and Sorted_Map).

The component would have to be specified, of course. I'm sure Matt or I
would be able and willing to do that, and it wouldn't take very long.
There is no extra implementation cost. Implementors are going to have to
implement a hash table in order to implement hashed maps anyway. Let's
be good software engineers and allow the reuse of that effort.

****************************************************************

From: Randy Brukardt
Sent: Wednesday, February 18, 2004  5:12 PM

> I'm afraid you're not making yourself clear now. With rare exceptions,
> hash functions can produce the same hash value for different elements.
> This results in "collisions".

Of course. But to me, a hash table is just a table (array); collision
handling is not part of it. It's a necessary part of a component, of course,
which is why it's impossible to have a hash table component.

But arguing over terminology is pointless. You're arguing in favor of Matt's
Hashed_Set (even if you don't want to call it that). It's better to stick to
a common set of terminology, even if you don't like it.

...
> > In any case, as I said earlier, that implementation (which is very similar
> > to Matt's) would be horrible on our compiler. You'd end up with 3 separate
> > allocations per element, plus a bunch of call overhead. Other compilers
> > mileage may vary (although I'd expect most would generate better code
> > without the extra generic).
>
> The solution is simple: don't use your compiler :)

Them's fighting words, even with the smiley. Being intolerant of the
diversity of Ada implementations (and uses) is a good way to get yourself
tuned out of ARG deliberations.

> For most applications that will be willing to use a standard component,
> I doubt the performance will be unacceptable on any compiler.

Which of course is exactly the argument I've been making all along. Of
course, then the Sorted_Set and the Vector are also good enough -- which is
quite contrary to your position.

...
> The component would have to be specified, of course. I'm sure Matt or I
> would be able and willing to do that, and it wouldn't take very long.
> There is no extra implementation cost. Implementors are going to have to
> implement a hash table in order to implement hashed maps anyway. Let's
> be good software engineers and allow the reuse of that effort.

I've already said multiple times that there would be a significant extra
implementation cost. Even though some of the implementation could be reused,
there would still be a lot of unique work. In any case, repeating a
falsehood doesn't make it true.

But imagine for a moment that you're right, and there is not a line of extra
code that needs to be written. You're still doubling the documentation,
debugging, and testing costs for implementers. Clearly, this component will
need a unique set of tests, and while there is a bit of sharing available,
most of it will need to be different. And even if there are no bugs in the
implementation at all, you still have to do the testing. So the cost will be
a lot more than zero.

****************************************************************

From: Stephen Leake
Sent: Tuesday, February 17, 2004  12:43 PM

...
> Because you don't always have a value to assign immediately.  What you
> want to do is make space in the vector for all the items, and then do
> the assignment.  For example, suppose you want to copy a list into a
> vector:
...
> If you don't do it this way, then your time complexity is O(n*m)
> instead of O(n+m).

Ok. I actually ran across a similar situation in Real Code this
weekend :).

If you were doing Insert (at end) rather than Insert (in the middle),
your time complexity would be O(m), right? (n is the size of the
vector, m is the size of the list).

In general, Insert (in the middle) is an O(n) operation. So Insert_N
(in the middle, no elements) is an optimization to work around that in
some common cases.

I think if you are really doing code like this, and you want the
optimization, you should make the Vector Item_Type be an access type,
and manage the memory yourself. Optimized code is always harder to
write.

So I'm affirming that deleting the itemless insertion from the
indefinite map is ok.

****************************************************************

From: Matthew Heaney
Sent: Tuesday, February 17, 2004  1:10 PM

> Ok. I actually ran across a similar situation in Real Code this
> weekend :).

This happens all the time: you know in advance how many items you want
to insert, so you tell the vector allowing it to preallocate, and then
you do the insert.


> If you were doing Insert (at end) rather than Insert (in the middle),
> your time complexity would be O(m), right? (n is the size of the
> vector, m is the size of the list).

Yes, that's correct.  The n part reduces to 0, because you're not
sliding elements already in the vector container.

> In general, Insert (in the middle) is an O(n) operation. So Insert_N
> (in the middle, no elements) is an optimization to work around that in
> some common cases.

Yes.  It is specifically designed for inserting in the middle of a vector.

In the case of the STL, what happens it that you specify an iterator
pair designating the half-open range of the source container.  The
vector probably computes the distance() first, then does the internal
expansion, and then walks the source range constructing each new vector
element in place.

For a std::vector, the distance() function is specialized so that it
computes the distance in constant time (because vector iterators are
random access iterators, and therefore distance() can be implementing
for a vector by simple subtraction).

We can't get this sophisticated in Ada, but we can be almost as
efficient.  Instead of the vector itself calling distance(), it's the
vector user who computes the distance (by whatever method makes sense),
and then calls Insert_N to do the preallocation.

So in this particular case (inserting multiple elements in the middle of
a vector), in Ada the complete insertion operation actually comprises
two separate calls.

> I think if you are really doing code like this, and you want the
> optimization, you should make the Vector Item_Type be an access type,
> and manage the memory yourself. Optimized code is always harder to
> write.

No.  Doesn't that argument undermine the case for indefinite forms?  The
Insert_N operation provides important and useful functionality, just
like Resize does.  There's nothing special about indefinite vectors, and
the same techniques for optimized insertions apply as for the definite form.

> So I'm affirming that deleting the itemless insertion from the
> indefinite map is ok.

I think they need to stay.  If nothing else the definite and indefinite
forms require a more or less identical interface.

****************************************************************

From: Alexandre E. Kopilovitch
Sent: Tuesday, February 17, 2004  2:25 PM

This is return to the topic of interfaces in conjunction with Container
Library, to the starting point of recent brief discussion - now I'm taking
another branch of argumentation, which addresses the topic in the most direct
way.

...
> One way to get around that would be to put the interfaces into the generic
> units. But then, the interfaces would only be usable with that container --
> hardly a useful interface! You might as well just use the container
> directly.

Although I'm not 100% sure what exactly you see as a problem with generic
interfaces in Container Library, but guessing that you mean massive duplication
of declarations of operations, I came to an idea how to overcome this problem
with generics and make employment of interfaces in the library rather smooth.
Let's introduce new form of interface declaration:

  type IT is interface of T; -- where T is a type, possibly generic one

This will mean that IT is interface, which consists of declarations of all
public primitive operations of T, in which all occurences of type T are
substituted by interface IT. Type T automatically implements IT.

If T in above declaration is generic type then IT is generic interface. In
that case instantiation (perhaps partial) may be made inside the declaration,
if needed:

  type IT is interface of T<instantiation-paramater(s)>; -- for generic T

I think that this form of interface declaration will solve the problem
mentioned above.

[Also, this form may be extended even further - by not requiring T to be a
tagged type (but interface type IT will still be tagged) - with the same
definition, that is, the interface IT constists of all primitive operations
(which are all public in this case) of T. But this probably isn't directly
related to the Container Library.]

****************************************************************

From: Robert A. Duff
Sent: Tuesday, February 17, 2004  6:03 PM

...
> > If you say "with A.B.C;", it causes all of A, A.B, and A.B.C to be
> > visible.  Look at the definition of "mentioned in a with_clause".
> > This is because compilers might have trouble dealing with holes in
> > the visibility -- cases where something is in scope, but the thing it's
> > declared inside of is not.
> >
> > Use clauses don't work like that.
>
> I guess I'm still confused.

I don't think you're confused.  I think I wrote something confusing above.
Sorry about that.

>...  I just tried this:
>
> with Character_Vectors;  use Character_Vectors;
>
> procedure Test is
>     use type Ada.Containers.Hash_Type;
> begin
>     null;
> end Test;
>
> but GNAT is telling me that I'm
>
> "missing with for Ada.Containers"

Correct.  If you want to refer to Ada.Containers.Hash_Type,
you need to say "with Ada.Containers;".  I was assuming you would
have said "with Ada.Containers.Something;" already, but that's not
necessarily true.

I should probably admonish you to use the RM as the definition of the
language, rather than what one compiler happens to do.  ;-)
Chapters 8 and 10 explain all this -- but chapter 8 is pretty
tough going.

> I put a subtype declaration in the vectors package, like this:
>
> subtype Hash_Type is Containers.Hash_Type;
>
> and then I could say:
>
>     use type Character_Vectors.Hash_Type;

Yes, that could work.  However, that will make use-package clauses less
useful, because if you say "use Character_Vectors, Integer_Vectors;",
then the two Hash_Type's will conflict, and cancel each other out.

> But that's different from what Randy said to Nick:
>
>  >I agree, but not with your solution. Clearly, there should
>  >be a Size_Type next to Hash_Type in Ada.Containers. If you
>  >actually need to do math on it (which should be very rare),
>  >you'd need a "use type Ada.Containers.Size_Type;", but with
>  >any decent style, you'll need that no matter what the type
>  >is or where it is declared.
>
> I didn't know how to get "use type Ada.Containers.Size_Type;" to work
> without also with'ing Ada.Containers.

You're right.

>...  But perhaps Randy meant something
> else?  I'm not sure.
>
> If you want to declare
>
>     type Size_Type is range 0 .. <implementation-defined>;
>
> in Ada.Containers, I assumed you'd have to also declare a Size_Subtype
> in Ada.Containers.Sorted_Sets and Ada.Containers.Maps, like this:
>
>     subtype Size_Subtype is Size_Type;
>
> and then the user would have to say:
>
> with Instantiation;  use type Instantiation.Size_Subtype;
>
> But that's different from saying "use type Ada.Containers.Size_Type;".

You're right.  I suggest that if Size_Type is declared in Containers,
let the programmer write "with Ada.Containers; use type
Ada.Containers.Size_Type;".  Declaring Size_Subtype causes the
"cancelling out" problem I mentioned above.  But I don't feel strongly
about this.  I do think my suggestion for Vectors solves the problems
better -- but not for sets/maps (unless you pass in the Size_Type as a
generic formal to those).

During the Ada 9X project, we considered a rule that if there are 17
potentially directly visible things call X, and they're all essentially
renamings of the same thing, then the compiler picks one at random.
But the rules would be pretty tricky, and the idea got dropped.

****************************************************************

From: Nick Roberts
Sent: Wednesday, February 18, 2004  12:23 PM

Apologies for this not being in response to anything anyone has
specifically said, but the containers topic has generated such a spout of
messages, it's difficult!

I would repeat (I'm sure I've said it before many times) that the container
packages /do not need/ indefinite forms, now or in the future.

The reason is simple:

(a) if you want to contain an indefinite type, and you want to abstract
away such low-level mechanics as memory management (quite rightly), all you
do is write a package that exports a definite private type, with the
required operations and other accoutrements (constants, support types and
subtypes), and encapsulates the underlying indefinite type indide that
definite type (almost certainly by using dynamic allocation);

(b) to support class-wide types or any indefinite types whose objects are
not dynamically allocated (so that memory management is not an issue), you
can contain an access type that designates them.

For strings, Ada.Strings.Unbounded is a perfect example of (a). You can use
definite containers on unbounded strings without problems.

End of story, and hopefully end of argument.

Randy suggested a semi-global Size_Type declared in Ada.Containers. Bob D
reckoned this was good for maps and sets, but not vectors. I still disagree
with Bob about the vector package having its own Index_Type generic
parameter. I think that the practical advantages of having a pre-supplied
universal index type would greatly outweigh the advantages of having the
way it currently is. Furthermore, I think Randy's idea has the merit of
echoing the approach taken by the existing *_IO packages. Why don't we have
something like this:

    type Count is range 0 .. [imp def];
    subtype Positive_Count is Count range 1..Count'Last;

declared in Ada.Containers, and then:

    generic

       type Element_Type is private;

       with function "=" (Left, Right : Element_Type)
          return Boolean is <>;

    package Ada.Containers.Vectors is

       pragma Preelaborate;

       type Vector_Type is private;

       function "=" (Left, Right : Vector_Type) return Boolean;

     function Max_Length (Vector : Vector_Type) return Count; -- was Length

       function Is_Empty (Vector : Vector_Type) return Boolean;

       procedure Clear (Vector : in out Vector_Type);

       procedure Swap (Left, Right : in out Vector_Type);

       procedure Append (Vector   : in out Vector_Type;
                         New_Item : in     Element_Type);

       procedure Insert (Vector   : in out Vector_Type;
                         Before   : in     Positive_Count;
                         New_Item : in     Element_Type);

       procedure Insert (Vector   : in out Vector_Type;
                         Before   : in     Positive_Count);

       procedure Insert_N (Vector   : in out Vector_Type;
                           Before   : in     Positive_Count;
                           How_Many : in     Count;
                           New_Item : in     Element_Type);

       ...

       function Length (Vector : Vector_Type) return Natural; -- was Size

       procedure Resize (Vector     : in out Vector_Type;
                         New_Length : in     Count);

       -- function Front, Back ?

       function First (Vector : Vector_Type) return Positive_Count;

       ...

If the user felt it was important to have index type safety, or an index
base other than 1 -- and I don't think it will be often -- she could always
wrap an instantiation of Ada.Containers.Vectors in a package that provided it.

I could suggest a few more useful operations for vectors. How about vector
concatenation? Slicing?

I might suggest a constant Null_Vector, obviating the need for the Is_Empty
function and Clear procedure, but I must admit one disadvantage of such
constants is that they are not inherited. I've found this a small pain
occasionally. On the other hand, the test V = Foo.Null_Vector might be
considered better (more natural, more readable) than Is_Empty(V) and V :=
Foo.Null_Vector than Clear(V). But personally I'm not sure.

I'm none too keen on the

       generic
          type Element_Access is access all Element_Type;
       function Generic_Element (Vector : Vector_Type;
                                 Index  : Index_Type'Base)
          return Element_Access;

sub-package. It will surely constrain the implementation to declaring its
internal storage array(s) with aliased components. This could have some
pretty unfortunate effects on efficiency.

I really like the Generic_Sort. That would certainly be very handy.

By the way, I wonder if anyone has thought about a likely implementation of
this package. I know Matt's done a sample imp (which I haven't had time to
look at, sorry), but it seems to me that a reasonably efficient
implementation would not be very simple. Are we saying that implementations
are not expected to be very efficient, or that implementations are expected
to be sophisticated?

Another suggestion that I feel you should think about is a package that has
almost the same interface as A.C.Vectors, but whose container objects are
capable of being metamorphosed (perhaps implicitly, perhaps explicitly, or
perhaps both) between the array form (with fast random access) and the
linked-list form (with efficient appendage). This would fit very neatly
with typical usage: building by successively appending elements, followed
by usage that requires random access (sorting being the classic example).
In the light of this idea, might not a List (linked list) package actually
be more fundamentally useful, that simply had an operation to convert the
list to an array?

****************************************************************

From: Matthew Heaney
Sent: Wednesday, February 18, 2004  1:21 PM

> If the user felt it was important to have index type safety, or an index
> base other than 1 -- and I don't think it will be often -- she could
> always wrap an instantiation of Ada.Containers.Vectors in a package that
> provided it.

The vector package will import a generic formal index type.


> I could suggest a few more useful operations for vectors. How about
> vector concatenation? Slicing?

This is an open issue, and I mentioned this in the errata list I sent
Randy this morning.

> I might suggest a constant Null_Vector, obviating the need for the
> Is_Empty function and Clear procedure, but I must admit one disadvantage
> of such constants is that they are not inherited. I've found this a
> small pain occasionally. On the other hand, the test V = Foo.Null_Vector
> might be considered better (more natural, more readable) than
> Is_Empty(V) and V := Foo.Null_Vector than Clear(V). But personally I'm
> not sure.

The vector will have Is_Empty and Clear operations.

> I'm none too keen on the
>
>       generic
>          type Element_Access is access all Element_Type;
>       function Generic_Element (Vector : Vector_Type;
>                                 Index  : Index_Type'Base)
>          return Element_Access;
>
> sub-package. It will surely constrain the implementation to declaring
> its internal storage array(s) with aliased components. This could have
> some pretty unfortunate effects on efficiency.

The aliasing of elements is an open issue (for other reasons), and was
included in the errata list I sent Randy this morning.

> I really like the Generic_Sort. That would certainly be very handy.
>
> By the way, I wonder if anyone has thought about a likely implementation
> of this package. I know Matt's done a sample imp (which I haven't had
> time to look at, sorry), but it seems to me that a reasonably efficient
> implementation would not be very simple. Are we saying that
> implementations are not expected to be very efficient, or that
> implementations are expected to be sophisticated?

It's implemented using an unconstrained array (that's why the container
is named "vector").  The implementation is as complicated as array
manipulation is.

The Generic_Sort in the reference implementation is implemented using a
quicksort algorithm, augmented with a median-of-3 to find the pivot.

> Another suggestion that I feel you should think about is a package that
> has almost the same interface as A.C.Vectors, but whose container
> objects are capable of being metamorphosed (perhaps implicitly, perhaps
> explicitly, or perhaps both) between the array form (with fast random
> access) and the linked-list form (with efficient appendage).

The vector is optimized for inserting at the back end of the container.
  Append for a vector is O(1), just like a list is.  (The only
difference is that appending to a vector is "amortized" constant time.)

> This would
> fit very neatly with typical usage: building by successively appending
> elements, followed by usage that requires random access (sorting being
> the classic example).

That's exactly how a vector is intended to be used.  You do not need a
list to do what you have described.

> In the light of this idea, might not a List
> (linked list) package actually be more fundamentally useful, that simply
> had an operation to convert the list to an array?

There is no list container is this version of the standard container
library.

****************************************************************

From: Marius Amado Alves
Sent: Wednesday, February 18, 2004  2:44 PM

On Wednesday 18 February 2004 18:22, Nick Roberts wrote:
> I would repeat (I'm sure I've said it before many times) that the container
> packages /do not need/ indefinite forms, now or in the future.
>
> The reason is simple:
>
> (a) if you want to contain an indefinite type, and you want to abstract
> away such low-level mechanics as memory management (quite rightly), all you
> do is write a package that exports a definite private type, with the
> required operations and other accoutrements (constants, support types and
> subtypes), and encapsulates the underlying indefinite type indide that
> definite type (almost certainly by using dynamic allocation);

Beaten argument. And self contradictory: dynamic allocation *is* memory
management. The "level" of it does not matter. The user does not want to do
*any* memory management.

> (b) to support class-wide types or any indefinite types whose objects are
> not dynamically allocated (so that memory management is not an issue), you
> can contain an access type that designates them.

Sure.

> For strings, Ada.Strings.Unbounded is a perfect example of (a). You can use
> definite containers on unbounded strings without problems.

Unbounded_String is in fact a wonderful container. And a paradigmatic example
of what the user expects from any container. So more ammo against (a).

> End of story, and hopefully end of argument.

Unfortunately no.

> Randy suggested a semi-global Size_Type declared in Ada.Containers. Bob D
> reckoned this was good for maps and sets, but not vectors. I still disagree
> with Bob about the vector package having its own Index_Type generic
> parameter. I think that the practical advantages of having a pre-supplied
> universal index type would greatly outweigh the advantages of having the
> way it currently is.

I agree. I'm taking the chance to express myself on this issue. For me the
index type could be simply Positive, like in Unbounded_Arrays (a package I
presented at the ASCL Workshop, echoes of which can still be heard in the
current proposal e.g. Resize and unassigned elements).

/*
> Furthermore, I think Randy's idea has the merit of
> echoing the approach taken by the existing *_IO packages.

I never liked this _Count business but ok.
*/

> I might suggest a constant Null_Vector...

No please.

> . . . the test V = Foo.Null_Vector might be
> considered better (more natural, more readable) than Is_Empty(V) and V :=
> Foo.Null_Vector than Clear(V).

Not to me, no.

> I'm none too keen on the
>
>        generic
>           type Element_Access is access all Element_Type;
>        function Generic_Element (Vector : Vector_Type;
>                                  Index  : Index_Type'Base)
>           return Element_Access;
>
> sub-package. It will surely constrain the implementation to declaring its
> internal storage array(s) with aliased components. This could have some
> pretty unfortunate effects on efficiency.

And it's not terribly useful either. If the user wants to do pointer
programming he can do that him self with containers of pointers, no?

> . . .
> Another suggestion that I feel you should think about is a package that has
> almost the same interface as A.C.Vectors, but whose container objects are
> capable of being metamorphosed

If it's another package with a similar interface then just make it a list,
don't complicate it with transmorphing. I tried a similar stunt with Truc but
then I saw the light :-)

> ... linked-list form (with efficient appendage)

You mean insertion. Appendage can be efficient with vectors.

> In the light of this idea, might not a List (linked list) package actually
> be more fundamentally useful, that simply had an operation to convert the
> list to an array?

Maybe. Personally I wouldn't mind at all seeing a list package there.
Paralelled by a reduction of the vectors interface. Once you have lists, you
don't need the (unefficient) insertion and deletion in the middle of vectors
anymore. And as said above, remove pointer programming support--in all
structural varieties (vectors, lists, maps, sets). The total reduction would
make plenty of space for the so much wanted--and rightfully so--list.

* Indefinite elements revisited : an alternative : elementary containers *

I think we all agree that the main rationale for having indefinite elements is
freeing the user to do memory management. Many people do not like, want, or
know how, to dance with pointers.

I and Matt have already shown how indefinite elements can be added to the
proposal, with packages paralleling the ones for definite elements, defined
in a one-page annex.

An alternative is to provide a minimal package of 'elementary containers' that
does the required encapsulation of an indefinite inside a definite that the
user can then use to instantiate 'normal' containers. This alternative has
the virtue of focusing on the main requirement (freeing the user of doing
memory management).

  generic
    type Element_Type (<>) is private;
  package Ada.Containers.Elementary is
    type Container_Type is private;
    function Put (Item : Element_Type) return Container_Type;
    function Get (Container : Container_Type) return Element_Type;
  end;

  package Boxes is new Elementary (My_Indef_Type);
  package My_Vectors is Vectors (Boxes);
  use Boxes, My_Vectors;
  V : My_Vectors.Vector_Type;
  Append (V, Put (My_Indef_Object));
  My_Op_Upon_The_Indef_Type (Get (Element (V, 1)));

For a 'real' example see the implementation of Truc
(www.liacc.up.pt/~maa/containers).

This breaks the only-one-instantiation requirement but it is for a good cause
:-)

Personally I'd be quite happy with this solution. And I'm a REALLY BIG fan of
indefinite elements, so we can safely assume all the others will be happy
too, and the standard will be embraced by ALL :-)

Note the minimal container is useful also for other situations, e.g. for
making an (core language) array of indefinite elements:

  A : array (1 .. 10) of Boxes.Container_Type;
  A (1) := Put (My_Indef_Object);

And remember you have memory magic i.e. when you write

  A (1) := Put (Another_Indef_Object);

the previous value is cleanly disposed of.

Compare this with all the stuff you have to write (and review, and debug, and
test, and...) to get the same effect with core language devices.  (Well, this
is just backing up the rationale above.)

****************************************************************

From: Randy Brukardt
Sent: Wednesday, February 18, 2004  5:55 PM

> > I'm none too keen on the
> >
> >        generic
> >           type Element_Access is access all Element_Type;
> >        function Generic_Element (Vector : Vector_Type;
> >                                  Index  : Index_Type'Base)
> >           return Element_Access;
> >
> > sub-package. It will surely constrain the implementation to declaring its
> > internal storage array(s) with aliased components. This could have some
> > pretty unfortunate effects on efficiency.
>
> And it's not terribly useful either. If the user wants to do pointer
> programming he can do that him self with containers of pointers, no?

I think the idea is to allow update-in-place of elements (which matters if
the elements are large or indefinite). It's likely to be more necessary with
Maps than with Vectors, but it's better to have the same operations for all
of the containers.

It wouldn't be necessary to use a generic formal for this purpose, of
course, just put an access type in here:
    type Element_Access is access all Element_Type;
    function Writable_Element (Vector : Vector_Type;
                               Index  : Index_Type'Base)
         return Element_Access;

That's a bit less flexible, but probably flexible enough if the primary
purpose is a reference.

...
> An alternative is to provide a minimal package of 'elementary containers' that
> does the required encapsulation of an indefinite inside a definite that the
> user can then use to instantiate 'normal' containers. This alternative has
> the virtue of focusing on the main requirement (freeing the user of doing
> memory management).

I tend to prefer the two packages mechanism. That's because having the local
memory management also makes the proportionality constant for Inserts and
Sorts much less, and I'd not want to lose that.

Indeed, if the proposal was adopted with both Definite and Indefinite
element types, I'd suggest using the Indefinite version for
large/expensive-to-copy element types even if the type is definite and any
amount of Insert/Delete/Sorting will be done. (For Janus/Ada, the two
implementations would be identical, but that would be unusual, and I
wouldn't recommend anyone depend on that.) The Definite version would be
best for small element types (like access types), because it would have a
lot less overhead for adding an item and destroying the container.

****************************************************************

From: Nick Roberts
Sent: Wednesday, February 18, 2004  4:35 PM

Marius Amado Alves wrote:

> Personally I wouldn't mind at all seeing a list package there.

Indeed, and I feel the argument for a list package is really stronger than
for a vectors one. With a list container, you can do all the insertion and
deletion you like perfectly efficiently, and then just convert it to an
array for random access. What's wrong with that? Why then would vectors be
needed at all?

 > Many people do not like, want, or know how, to dance with pointers.

I completely agree with this.

> I and Matt have already shown how indefinite elements can be added to the
> proposal, with packages paralleling the ones for definite elements, defined
> in a one-page annex.

Yuk.

> An alternative is to provide a minimal package of 'elementary containers' that
> does the required encapsulation of an indefinite inside a definite that the
> user can then use to instantiate 'normal' containers. This alternative has
> the virtue of focusing on the main requirement (freeing the user of doing
> memory management).

Brilliant! I think this is a superb idea. Maybe we could term a container
of this kind a 'keeper'; I'm sure someone can come up with a better one.

    with Ada.Finalization; -- for private part only

    generic
      type Element_Type (<>) is private;

    package Ada.Containers.Keepers is

       type Keeper is private;

       function To_Keeper (Item : Element_Type) return Keeper;

       function Empty_Keeper return Keeper;

       function Value (Source : Keeper) return Element_Type;

       function Is_Empty (Source : Keeper) return Boolean;

       procedure Clear (Source : in out Keeper);

       procedure Replace (Source : in out Keeper;
                          By     : in     Element_Type);

    private

       type Element_Access is access Element_Type;

       type Keeper is new Ada.Finalization.Controlled with
          record
             Ref: Element_Access; -- null for empty
          end record;

    end;

    package My_Keepers is new Ada.Containers.Keepers(My_Indef_Type);

    package My_Vectors is Ada.Containers.Vectors(My_Keepers.Keeper);

    use My_Keepers, My_Vectors;

    V : My_Vectors.Vector_Type;

    Append( V, To_Keeper(My_Indef_Object) );

    My_Op_Upon_The_Indef_Type( Value( Element(V,1) ) );

Possibly 'To_Keeper' should be named 'Make_Keeper' or 'New_Keeper'. I've
shown the likely implementation of the Keeper type.

> Personally I'd be quite happy with this solution. And I'm a REALLY BIG fan of
> indefinite elements, so we can safely assume all the others will be happy
> too, and the standard will be embraced by ALL :-)

I really REALLY like Marius' idea here. Yes please!

> Note the minimal container is useful also for other situations, e.g. for
> making an (core language) array of indefinite elements:
>
>   A : array (1 .. 10) of Boxes.Container_Type;
>   A (1) := Put (My_Indef_Object);

or alternatively:

    A : array (1 .. 10) of My_Keepers.Keeper;
    Replace( A(1), My_Indef_Object );

which might be slightly more efficient.

****************************************************************

From: Randy Brukardt
Sent: Wednesday, February 18, 2004  7:32 PM

...
> Indeed, and I feel the argument for a list package is really stronger than
> for a vectors one. With a list container, you can do all the insertion and
> deletion you like perfectly efficiently, and then just convert it to an
> array for random access. What's wrong with that? Why then would vectors be
> needed at all?

That's going to be very expensive if the length of the list is very long
and/or copying the elements is expensive. Matt's design tries to avoid
copying elements as much as possible, and he's particularly concerned with
the containers being able to 'scale-up' to large numbers of elements.

If the sequence (I'm using the general term here) doesn't have very big
elements and can't get very long, you don't need any fancy container to hold
it. Just declare an array of the maximum size and use it.

The value of any container is when one or both of those things is true, and
you do need the memory management implied by a container. And, if you can
only have one sequence container, the vector container (which allows
computed access to elements) is more flexible than the list container (which
doesn't). Besides, a useful list is a lot easier to write than a useful
growable array.

****************************************************************

From: Marius Amado Alves
Sent: Thursday, February 19, 2004  6:51 AM

On Wednesday 18 February 2004 22:34, Nick Roberts wrote:

[Lists and Vectors]

> Marius Amado Alves wrote:
> > Personally I wouldn't mind at all seeing a list package there.
>
> Indeed, and I feel the argument for a list package is really stronger than
> for a vectors one.

I don't feel that way.

> With a list container, you can do all the insertion and
> deletion you like perfectly efficiently, and then just convert it to an
> array for random access. What's wrong with that?

Efficiency. Surely you cannot convert a list of a zillion elements just like
that.

> Why then would vectors be
> needed at all?

See above. And also, you often need the precise vector abstraction. Let it be
there ready for use. Just add the precise list abstraction. They will live
there happily side by side.

[Elementary Containers]

>     generic
>       type Element_Type (<>) is private;
>     package Ada.Containers.Keepers is
>        type Keeper is private;
>        function To_Keeper (Item : Element_Type) return Keeper;
>        function Empty_Keeper return Keeper;
>        function Value (Source : Keeper) return Element_Type;
>        function Is_Empty (Source : Keeper) return Boolean;
>        procedure Clear (Source : in out Keeper);
>        procedure Replace (Source : in out Keeper;
>                           By     : in     Element_Type);
>     private...

Looks good. Compare with this 'real code' example from AI302/2:

  generic
    type Element (<>) is private;
    type Element_Ptr is access all Element;
    type Container is private;
    with procedure Put (C : in out Container; E : Element) is <>;
    with function Put (E : Element) return Container is <>;
    with function Get (C : Container) return Element is <>;
    with procedure Delete (C : in out Container) is <>;
    with function Access_Of (C : Container) return Element_Ptr is <>;
    with function "=" (L, R : Container) return Boolean is <>;
    with procedure Overwrite (C : Container; E : Element) is <>;
    with function Img (C : Container) return String is <>;
  package Signature is end;

Operations side by side:

Minimal     AI302/2          Nick              Remark
------------------------------------------------------------------
Put(E)->C   Put(E)->C        To_Keeper(E)->C   yes, Insert
Get(C)->E   Get(C)->E        Value(C)->E       yes, Element
            Put(ioE,C)       Replace(ioC,E)    yes, Replace(C,E)?
            Delete(ioC)      Clear(C)          yes, Clear
            Access_Of(C)->P                    for update-in-place
            "="(C,C)->B                        no
            Overwrite(C,E)                     for update-in-place
            Img(C)->S                          no
                             Empty_Keeper->C   no
------------------------------------------------------------------

Abbreviations:

E = element type
C = container type
-> = returns
io = in out
B = Boolean
P = pointer to element

In the remarks:

A "yes" means the operation is definitely a go, with the indicated name for
consistency with AI302/3.

The remark "Replace(C,E)?" is associated with the fact that in AI302/3 the
container parameter of the Replace_Element operation for vectors is just in,
not in out. But in the corresponding operation for maps the container
parameter is in out. Only the ARG and/or Matt can explain this.

The two "for update-in-place" operations:

Access_Of is like the Generic_Element (terrible name) of AI302/3 vectors.

Overwrite(C,E) is logically equivalent to Access_Of (C).all := E.

Overwrite is the update-in-place operation distiled. So if Access_Of (or
Generic_Element) is there just for update-in-place it can be dropped from the
interface.

In C++ Overwrite is dangerous if the new element is bigger than the previous.
I hope Ada can avert this, or at least detect it and raise an exception.

Whatever you do, leave a means for update-in-place in the interface. Albeit
dangerous (?), it is very useful for efficient replacement when the user
knows that the sizes are equal.

* Names. Finalising a proposal *

"Keeper" is too colloquial, no? And has a connotation to football. "Cell"
would be a better metaphor. Of course the container type name and the other
names must get along with each other e.g.

package       container type  element type
------------------------------------------
Elementary    Container_Type  Element_Type
Cells         Cell_Type       Element_Type
Cells         Cell_Type       Value_Type
------------------------------------------

If there are no essential disagreements with this proposal, I and Nick (?)
will try to formalise a proposal, with the options indicated above.

****************************************************************

From: Marius Amado Alves
Sent: Thursday, February 19, 2004  8:25 AM

On Wednesday 18 February 2004 23:48, Randy Brukardt wrote:

[Operations for update-in-place]

> Marius Amado Alves wrote (responding to Nick Roberts):
> > > I'm none too keen on the
> > >
> > >        generic
> > >           type Element_Access is access all Element_Type;
> > >        function Generic_Element (Vector : Vector_Type;
> > >                                  Index  : Index_Type'Base)
> > >           return Element_Access;
> > >
> > > sub-package. It will surely constrain the implementation to declaring
>
> its
>
> > > internal storage array(s) with aliased components. This could have some
> > > pretty unfortunate effects on efficiency.
> >
> > And it's not terribly useful either. If the user wants to do pointer
> > programming he can do that him self with containers of pointers, no?
>
> I think the idea is to allow update-in-place of elements (which matters if
> the elements are large or indefinite).

If large yes. If indefinite not quite. You have to deal with possibly
different sizes. See my previous message in reply to Nick-

> It's likely to be more necessary
> with Maps than with Vectors,

I don't see why, but ok.

> but it's better to have the same operations
> for all of the containers.

Ok.

> It wouldn't be necessary to use a generic formal for this purpose, of
> course, just put an access type in here:
>     type Element_Access is access all Element_Type;

Yes, please do that. The generic breaches the only-one-instantiation
requirement.

[Indefinite elements]

> I tend to prefer the two packages mechanism. That's because having the
> local memory management also makes the proportionality constant for Inserts
> and Sorts much less

If I understand correctly, not quite. Not the "much" anyway. See the
provisions for update-in-place for elementary containers in my previous
message (in reply to Nick).

> Indeed, if the proposal was adopted with both Definite and Indefinite
> element types, I'd suggest using the Indefinite version for
> large/expensive-to-copy element types even if the type is definite and any
> amount of Insert/Delete/Sorting will be done. (For Janus/Ada, the two
> implementations would be identical, but that would be unusual, and I
> wouldn't recommend anyone depend on that.) The Definite version would be
> best for small element types (like access types), because it would have a
> lot less overhead for adding an item and destroying the container.

Note this only applies to *inerently* inefficient operations e.g.
inserting/deleting in vectors. And, again, provisions for update-in-place for
elementary containers minimize the 'problem'.

And shouldn't we avoid mingling definiteness and largeness? They are
independent factors.

Personally, as a user, I'm happy with either solution (Annex <IE> or
elementary containers). I can easily construct either one from the other.

But as an implementer I would prefer the elementary containers solution,
because it is so less trouble. I'm surprised that the real compiler writer
Randy feels the contrary.

And it seems much less work for conformance testing also.

And it probably eases the specification also. Annex <IE> is a bit strange and
bug-prone, because it is assuming that a lot about definite elements
transposes to indefinite. We already found some "anomalies". Elementary
containers is just a 'normal' spec. It does not require any *combined*
testing with the other containers. The user can easily derive by himself any
theorems about a container of elementary containers from the two independent
specs.

And I think everybody prefers a standard that just shows a package spec--over
one that defines one in English.

****************************************************************

From: Randy Brukardt
Sent: Thursday, February 19, 2004  6:21 PM

Marius Amado Alves:

> > I think the idea is to allow update-in-place of elements (which matters
if
> > the elements are large or indefinite).
>
> If large yes. If indefinite not quite. You have to deal with possibly
> different sizes.

Well, usually it would be used to update parts (components) of elements, not
the entire thing. If you're going to update the whole thing, use the safer
Replace_Element. Indefinite elements have components, too.

> [Indefinite elements]
>
> > I tend to prefer the two packages mechanism. That's because having the
> > local memory management also makes the proportionality constant for
Inserts
> > and Sorts much less
>
> If I understand correctly, not quite. Not the "much" anyway. See the
> provisions for update-in-place for elementary containers in my previous
> message (in reply to Nick).

For most implementations, it will make them much less. The canonical
implementation of a definite element is:

    type Internal_Array is array (Index_Type range <>) of aliased
Element_Type;

while for indefinite element is:

    type Element_Access is access all Element_Type;
    type Internal_Array is array (Index_Type range <>) of Element_Access;

so, when you're moving buckets for an insert, you're copying whole elements
in the definite case, and just pointers in the indefinite case. If element
copy is expensive (lots of controlled components, for instance), that can
make a huge difference.

> Note this only applies to *inerently* inefficient operations e.g.
> inserting/deleting in vectors.

Of course. But if you're using them a lot, it matters.

> And, again, provisions for update-in-place for
> elementary containers minimize the 'problem'.

I have no idea what you mean. When you have to copy an element, you have to
copy it. If "elementary containers" (BTW, that name is horrible, because
"elementary" means scalar and access types in Ada, and that is not what you
mean here) uses controlled types and does reference counted shallow copies,
it could avoid some overhead -- but at the cost of a lot of complexity.

> But as an implementer I would prefer the elementary containers solution,
> because it is so less trouble. I'm surprised that the real compiler writer
> Randy feels the contrary.

For us (because of generic sharing), there is no difference between definite
and indefinite elements. The compiler will internally transform
"Element_Type" into "Element_Access" (because the size and contents of the
actual type are unknown). Which is why I'm completely opposed to any
semantics differences between them.

And, because of that, your proposed solution would mean that both containers
would end up doing memory management. So everything would end up allocated
twice (the actual element, and then the "elementary container". That would
cause serious heap fragmentation problems (Windows is not good at handling
that), and I fear that the combination would be effectively unusable. At
which point we're out of business (changing the implementation of generics
is not an option).

For me, all of the elements should be indefinite, period. We don't need
definite versions. (That would make Janus/Ada look good, our implementation
would be competitive. :-) But I understand why no one else thinks that.

> And it seems much less work for conformance testing also.

Since the semantics are identical for the two packages, use the same tests
(with different types). Much less work than writing two sets of tests from
scratch.

> And it probably eases the specification also. Annex <IE> is a bit strange
and
> bug-prone, because it is assuming that a lot about definite elements
> transposes to indefinite. We already found some "anomalies".

Yes, but those are bugs in the design of the container. Do we really want to
be able to put random junk into containers? I don't think so.

There would be a problem if we were to decide to add array operations (since
indefinite can't be a component), but that's far from decided.

...
> And I think everybody prefers a standard that just shows a package
spec--over
> one that defines one in English.

That is precisely how all of the Wide_String packages work, and they haven't
caused a lot of problems. Indeed, the advantage of the indefinite packages
is that they are *very* small in terms of standard wording and "weight"
(that is, there is no new concepts to learn and understand with them).
That's not true of "elementary containers".

****************************************************************

From: Jeffrey Carter
Sent: Thursday, February 19, 2004  7:07 PM

Randy Brukardt wrote:

> Of course. But to me, a hash table is just a table (array); collision
>  handling is not part of it. It's a necessary part of a component, of
>  course, which is why it's impossible to have a hash table component.

OK. That's not the definition of a hash table that I learned, but we're
not really in disagreement. I'm curious, though: if a hash table is just
an array, what are the index and component types?

> Which of course is exactly the argument I've been making all along.
> Of course, then the Sorted_Set and the Vector are also good enough --
>  which is quite contrary to your position.

I'd be perfectly happy to not have a hash table or anything based on
one. If they exist, though, I might choose to use a hash table based on
expected performance for a specific application, and I would want to be
able to use it without an ugly kludge. If they exist, I think the
implementation should be available as well as the higher-level components.

****************************************************************

From: Stephen Leake
Sent: Friday, February 20, 2004  3:34 AM

Matthew Heaney <mheaney@on2.com> writes:

> In the case of the STL, what happens it that you specify an iterator
> pair designating the half-open range of the source container.  The
> vector probably computes the distance() first, then does the internal
> expansion, and then walks the source range constructing each new
> vector element in place.
>
> For a std::vector, the distance() function is specialized so that it
> computes the distance in constant time (because vector iterators are
> random access iterators, and therefore distance() can be implementing
> for a vector by simple subtraction).
>
> We can't get this sophisticated in Ada, but we can be almost as
> efficient.  Instead of the vector itself calling distance(), it's the
> vector user who computes the distance (by whatever method makes
> sense), and then calls Insert_N to do the preallocation.

Hmm. We could require a source container signature package, that
includes cursors and Distance; that should give the same efficiency as
C++ STL. We probably don't want that for ai302.

...
> > So I'm affirming that deleting the itemless insertion from the
> > indefinite map is ok.
>
> I think they need to stay.  If nothing else the definite and
> indefinite forms require a more or less identical interface.

Ok, I agree with you; itemless insert is useful and should be in the
indefinite containers.

However, the intended use is that they be immediatly followed by a
Replace operation, which specifies the item for each element. So
itemless insert should just insert null pointers in the underlying
container, and any operation that accesses an itemless element should
raise Constraint_Error, since it indicates a user error.


I've looked thru your indefinite_vectors package. Why do you have both
type VT and type Vector_Type?

****************************************************************

From: Matthew Heaney
Sent: Friday, February 20, 2004  1:50 PM

> Hmm. We could require a source container signature package, that
> includes cursors and Distance; that should give the same efficiency as
> C++ STL. We probably don't want that for ai302.

It's not necessary.  In the current design it just means you have to
supply the count yourself and do the vector pre-insert, then use your
favorite iteration method (over the target, over the source, active,
passive, etc, etc) to do the actual vector insert.

> Ok, I agree with you; itemless insert is useful and should be in the
> indefinite containers.
>
> However, the intended use is that they be immediatly followed by a
> Replace operation, which specifies the item for each element.

Yes, that is correct.  The state of the container immediately following
the pre-insert (what I've been calling "Insert_N") is intended only as a
temporary state, as a prelude to some form of replacement of the
elements in the newly-allocated slots.

> So itemless insert should just insert null pointers in the underlying
> container, and any operation that accesses an itemless element should
> raise Constraint_Error, since it indicates a user error.

Yes, for the indefinite form, an item-less insert would give each new
slot the value null (the original non-null values in those positions
would slide up), in anticipation of its replacement by a non-null value.

> I've looked thru your indefinite_vectors package. Why do you have both
> type VT and type Vector_Type?

It's a bit of a trick.  I used transitivity of visibility to make the
operations of the type directly visible.

****************************************************************

From: Marius Amado Alves
Sent: Friday, February 20, 2004  5:51 AM

On Friday 20 February 2004 00:20, Randy Brukardt wrote:

> ... When you have to copy an element, you have to
> copy it. If "elementary containers" (BTW, that name is horrible...

The correct name would be "uni-elementary containers". For some reason I lost
the "uni-". I'm considering changing to "cells".

> ... everything would end up
> allocated twice (the actual element, and then the "elementary container".
> That would cause serious heap fragmentation problems (Windows is not good
> at handling that), and I fear that the combination would be effectively
> unusable.

Serious problems? Effectively unusable? Are you sure? Just because of one more
level of allocation? For such small things as pointers? Forgot high
performance is not required?

> ... For me, all of the elements should be indefinite, period.

For me too!

> ... But I understand why no one else thinks that.

I don't (understand)!

> > ... one that defines one in English.
>
> That is precisely how all of the Wide_String packages work, and they
> haven't caused a lot of problems.

I know, but String to Wide_String is not a quantum leap like definite to
indefinite.

> Indeed, the advantage of the indefinite
> packages is that they are *very* small in terms of standard wording and
> "weight" (that is, there is no new concepts to learn and understand with
> them).

Only if the transposition is exceptionless. That is no "anomalies". Can we
assure that? I fear a flood of Ada Questions (?) beginning 2005.

> That's not true of "elementary containers".

Yes, but the new concept (the cell:-) is minimal, useful, "brilliant", natural
to every programmer.

In sum, we have three solutions to choose from, with pros and cons:

                                Only    Def.+   Def.+
                                indef.  indef.  cells
------------------------------------------------------
Changes to AI-302/3             many    few     few
Reference implementation        no      yes     yes
One more useful structure       no      no      yes
Janus issues                    no      no      yes
...

****************************************************************

From: Randy Brukardt
Sent: Friday, February 20, 2004  7:32 PM

Marius Amado Alves wrote:

...
> The correct name would be "uni-elementary containers". For some reason I
lost
> the "uni-". I'm considering changing to "cells".

"Cells" seems better to me. Short is always good!

> > ... everything would end up
> > allocated twice (the actual element, and then the "elementary
container".
> > That would cause serious heap fragmentation problems (Windows is not
good
> > at handling that), and I fear that the combination would be effectively
> > unusable.
>
> Serious problems? Effectively unusable? Are you sure? Just because of one
more
> level of allocation? For such small things as pointers? Forgot high
> performance is not required?

Well, a "Cell" (which is doing memory management) is not a pointer, it's a
controlled object containing a pointer. (Otherwise, the memory wouldn't be
recovered on scope exit, which is a clear no-no.) So that means its size is
more like 20 bytes for Janus/Ada. So I think I was wrong about fragmentation
problems (it is big enough to avoid those). But it certainly would be a
potential problem for memory use (if there are lot of them), and a lot more
overhead when items are copied (calls to Finalize and Adjust for each item,
which the directly indefinite version would not have - it wouldn't need
controlled elements as the container itself is controlled). Of course, this
doesn't matter in truly low performance applications, but there are a lot of
middle ground applications in which that could matter.

> > ... But I understand why no one else thinks that.
>
> I don't (understand)!

Bounded forms need to have definite components (the reason for bounded forms
is to have little or no dynamic memory management; it defeats the purpose to
then dynamically allocate the elements). We need to leave room for future
enhancements. Similarly, there is a lot less dynamic memory management in
with definite elements. Most implementers claim that's important to their
customers (they want repeatability). (It better not be important to
Janus/Ada customers, because we allocate a lot of things dynamically and
non-contiguously.) I have to trust their judgement.

...
> > Indeed, the advantage of the indefinite
> > packages is that they are *very* small in terms of standard wording and
> > "weight" (that is, there is no new concepts to learn and understand with
them).
>
> Only if the transposition is exceptionless. That is no "anomalies". Can we
> assure that? I fear a flood of Ada Questions (?) beginning 2005.

Well, I'm not worrying that the ARG is going to run out of work no matter
what ends up in the Amendment. I fully expect a flood of questions on the
containers. Almost all of the packages in Ada 95 (except the ones defined in
previous standards) generated a lot of questions. Why would this Amendment
be different??

> Yes, but the new concept (the cell:-) is minimal, useful, "brilliant",
natural
> to every programmer.

One more minor advantage to indefinite element containers: they only require
one instantiation to use. The "cell" solution requires two.

****************************************************************

From: Nick Roberts
Sent: Saturday, February 21, 2004  12:54 PM

> "Cells" seems better to me. Short is always good!

I like that name too. Splendid.

> Well, a "Cell" (which is doing memory management) is not a pointer, it's
> a controlled object containing a pointer. (Otherwise, the memory
> wouldn't be recovered on scope exit, which is a clear no-no.) So that
> means its size is more like 20 bytes for Janus/Ada. So I think I was
> wrong about fragmentation problems (it is big enough to avoid those).
> But it certainly would be a potential problem for memory use (if there
> are lot of them), and a lot more overhead when items are copied (calls
> to Finalize and Adjust for each item, which the directly indefinite
> version would not have - it wouldn't need controlled elements as the
> container itself is controlled). Of course, this doesn't matter in truly
> low performance applications, but there are a lot of middle ground
> applications in which that could matter.

It may or may not be a problem for memory use. The size of one 'cell'
object would, as you say, comprise in the ball park of 20 bytes (a tag and
a linked-list 'next' access value, in addition the access value referring
to the contained indefinite object). I reckon the overhead (the tag and the
next pointer) is likely to be 8 bytes in most cases, although it could be
quite a lot more. However, if the average size of each contained object is
significantly more than this overhead, it is unlikely to be really
significant (it may be a little annoying). If the inidefinite objects are
relatively small on average, it matters. I'm not really sure, myself, which
scenario will be prevalent in practice.

> One more minor advantage to indefinite element containers: they only
> require one instantiation to use. The "cell" solution requires two.

The extra instantiation could be somewhat amortised away in some (perhaps
many) realistic situations.

    type Fragment_Count is range 0..2000;
    subtype Fragment_Number is Fragment_Count range 1..Fragment_Count'Last;

    package Gene_Fragments is new Ada.Containers.Cells(Gene_Array);
    subtype Gene_Fragment is Gene_Fragments.Cell; use Gene_Fragments;

    package Fragment_Gangs is
       new Ada.Containers.Vectors(Fragment_Number,Gene_Fragment);
    subtype Fragment_Gang is Fragment_Gangs.Vector; use Fragment_Gangs;

    type Fixed_Gang is array (Fragment_Number range <>) of Gene_Fragment;

    Sample: constant Fixed_Gang := Ref_Samp_1 & Ref_Samp_2;

Here, the instantiation of a cell package permits us to declare an array of
cells in addition to a vector of them. I feel that cells would quite often
be useful for purposes other than allowing a (definite) container to
contain indefinite objects.

An advantage of definite containers over their indefinite counterparts is
that they permit conversion to and from arrays (including the slicing of a
linear container). The cell technique would have the extra advantage that,
since only definite containers are used, these array operations would
remain available. I feel that in itself could be quite a compelling argument.

In my ignorance, could I ask please what the presumed (proper)
implementation of Vectors is?

In my mind forms a picture of a tree structure with the leaves containing
(or pointing to) actual arrays which form fragments of the whole conceptual
array. Each fragment would have a counter saying how many of its elements
are actually used. Appending an element would require adding a leaf node if
there was no more space in the end fragment. Random selection of an element
would require descending the tree. Am I way off the mark?

If I'm not way off the mark, I would contend that building a linked list
and converting to an array (for subsequent random access) would be likely
to be superior (to building a vector and selecting randomly from it by tree
descent) in a majority of cases in practice.

****************************************************************

From: Matthew Heaney
Sent: Monday, February 23, 2004  6:23 PM

> In my ignorance, could I ask please what the presumed (proper)
> implementation of Vectors is?

See the files

ai302-containers-vectors.ad?
ai302-containers-indefinite_vectors.ad?

in the latest reference implementation for the details.

<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040223.zip>

This implementation has a few more examples, some of which use the new
indefinite vector container.  Look thru the anagram examples for some ideas.

> In my mind forms a picture of a tree structure with the leaves containing
> (or pointing to) actual arrays which form fragments of the whole conceptual
> array. Each fragment would have a counter saying how many of its elements
> are actually used. Appending an element would require adding a leaf node if
> there was no more space in the end fragment. Random selection of an element
> would require descending the tree. Am I way off the mark?

A vector is implemented as an unconstrained array.

> If I'm not way off the mark, I would contend that building a linked list
> and converting to an array (for subsequent random access) would be likely
> to be superior (to building a vector and selecting randomly from it by tree
> descent) in a majority of cases in practice.

To convert between container types, just use one of the iterators.

The container library takes pains to give the library user easy and
efficient access to the container elements (that means the actual
objects).

It is never the case that a container needs, say, an operation to
convert itself to an array specifically.  A container iterator allows
the library user himself to choose the target type, whatever makes the
most sense for him.

****************************************************************

From: Matthew Heaney
Sent: Tuesday, February 24, 2004  10:27 AM



Nick Roberts wrote:
>
> I might suggest a constant Null_Vector, obviating the need for the
> Is_Empty function and Clear procedure, but I must admit one disadvantage
> of such constants is that they are not inherited. I've found this a
> small pain occasionally. On the other hand, the test V = Foo.Null_Vector
> might be considered better (more natural, more readable) than
> Is_Empty(V) and V := Foo.Null_Vector than Clear(V). But personally I'm
> not sure.

This won't work, because the vector type privately derives from
Controlled, and therefore you can't declare a constant of the type in a
package with preelaborate categorization.

However, a constructor function would work.  Here are some ideas:

   function Null_Vector return Vector_Type;

   function Empty_Vector return Vector_Type;

   function New_Vector return Vector_Type;

   function To_Vector (Length : Size_Type) return Vector_Type;

   function To_Vector (New_Item : Element_Type;
                       Count    : Size_Type)
      return Vector_Type;


I actually had a need for something like this in one of my examples.

It's kind of a pain that the language doesn't give you a default
constructor for a type that you can pass as a parameter.  For example,
in C++ I can say:

    container.insert(T());

where T() invokes the default ctor for the element type T.

Ada does let you do something like this, when constructing an aggregate:

    type NT is new T with record
       I : Integer;
    end record;

    Object : constant NT := (T with I => 42);

Here we're allowed to use T as the value of the parent part of NT, when
constructing an aggregate of type NT.  But I can't use the type name as
the value of a parameter:

    Insert (Container, New_Item => T);  -- not legal Ada

I have to say something like:

    Insert (Container, New_Item => New_T);

where New_T is a function that returns type T.

****************************************************************

From: Randy Brukardt
Sent: Tuesday, February 24, 2004  1:39 PM

> It's kind of a pain that the language doesn't give you a default
> constructor for a type that you can pass as a parameter.

..
> Here we're allowed to use T as the value of the parent part of NT, when
> constructing an aggregate of type NT.  But I can't use the type name as
> the value of a parameter:
>
>     Insert (Container, New_Item => T);  -- not legal Ada

True, but Ada 200Y lets you say:

     Insert (Container, New_Item => (<>));

which is a default-initialized aggregate. Which is what you want, right??

(See AI-287.)

We originally tried to use the type name here, but it led to all kinds of
problems, and it isn't providing any actual information, so we decided to
use the box "<>" instead.

So all you really want is an Ada 200Y compiler. :-)

****************************************************************

From: Gary Dismukes
Sent: Tuesday, February 24, 2004  3:13 PM

> This won't work, because the vector type privately derives from
> Controlled, and therefore you can't declare a constant of the type in a
> package with preelaborate categorization.

Not completely true.  In Ada 200Y you can make a private type have
preelaborable initialization, in which case constants of the type
can be declared in preelaborable packages (see AI-161).  Type
Ada.Finalization.Controlled (and Limited_Controlled) are defined
to have preelaborable initialization, though there's a restriction
that if a user-defined controlled type overrides Initialize then
the type doesn't have preelaborable initialization.

****************************************************************

From: Matthew Heaney
Sent: Tuesday, February 24, 2004  4:15 PM

OK.  Thanks for the info.

The vector and (hashed) map containers don't override the Initialize
operation.

The (sorted) set does override Initialize.  Let me see if I can get rid
of that.

It might not matter anyway, since we can use the new "(<>)" notation to
construct an anonymous instance of the type.

****************************************************************

From: Matthew Heaney
Sent: Tuesday, February 24, 2004  4:27 PM

I just got rid of the override of Initialize for the set.  The full view
of Set_Type now looks like:

    function New_Back return Node_Access;

    type Set_Type is new Controlled with record
       Tree : Tree_Type := (Back => New_Back, Length => 0);
    end record;

The function New_Back does the allocation and initialization that I was
doing in Initialize.

I'll fold this change into the next release of the reference implementation.

****************************************************************

From: Matthew Heaney
Sent: Friday, February 27, 2004  12:29 PM

I just uploaded the latest version of the reference implementation:

<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040227.zip>

This version includes indefinite forms for all containers.  There are
also two more anagram examples, and a new genealogy example.
****************************************************************

From: Tucker Taft
Sent: Friday, February 27, 2004  4:07 PM

I had a couple of problems compiling this.
One problem is that you have two versions of package "String_Vectors",
one in the top-level dir, and one in the indefinite_vectors subdirectory.
You might want to delete the indefinite_vectors subdirectory, since it
is redundant with the ai302-containers-indefinite_vectors stuff, and
it is confusing because one uses "Natural" where the other uses "Size_Type."

The other problem I had was with your "Control_Type" in the
private part of indefinite_vectors/indefinite_vectors.ads.  Again,
this is largely redundant with ai302-containers-indefinite_vectors.
But for what it is worth, the former one doesn't compile with
our latest compiler, because the type declaration:

   type VT is new Rep_Types.Vector_Type;

fails with complaints about trying to add primitive operations
after a type is frozen.  It is a bit subtle, but this type
declaration is in fact implicitly declaring additional operations
on "Control_Type" *after* Control_Type has been passed to a generic.

The solution I came up with was putting the declaration of
Control_Type into a nested package ("Inner") starting at the
declaration of Control_Type and ending after the generic
instantiation producing Rep_Types.  Then the declaration
of VT is outside the (inner) package, meaning that the additional
operations it implicitly declares with parameters of type
Control_Type don't end up as primitives of Control_Type.
A corresponding change is needed in the body of Indefinite_Vectors.

In any case, ai302-containers-indefinite_vectors.ad? doesn't
have this problem -- you use a different approach.

I'll let you know about any other problems I encounter.

Very nice work, in any case!

****************************************************************

From: Matthew Heaney
Sent: Friday, February 27, 2004  4:43 PM

I wasn't sure whether I still needed the old indefinite_xxx subdirectories.

Those were originally created to show how to implement an indefinite
container as a thin layer on top of the official definite containers.

However, after I did that Randy suggested that having indefinite forms
as an official part of the library might be acceptable, so I went ahead
and implemented them, up in the parent directory.

I can either remove them entirely from the release, or move them off
into a deprecated subdirectory.

I suppose a README couldn't hurt, either...


> The other problem I had was with your "Control_Type" in the
> private part of indefinite_vectors/indefinite_vectors.ads.
...

OK.  That's easy enough to fix.  (I don't really need that derived type.
  It was only declared to effect transitivity of visibility.)


> The solution I came up with was putting the declaration of
> Control_Type into a nested package ("Inner") starting at the
> declaration of Control_Type and ending after the generic
> instantiation producing Rep_Types.  Then the declaration
> of VT is outside the (inner) package, meaning that the additional
> operations it implicitly declares with parameters of type
> Control_Type don't end up as primitives of Control_Type.
> A corresponding change is needed in the body of Indefinite_Vectors.

OK.  Thanks for the tip.


> In any case, ai302-containers-indefinite_vectors.ad? doesn't
> have this problem -- you use a different approach.

Indeed.  That version is implemented natively, not as a thin layer.

The versions in the parent directory are the only ones you really care
about.  I can move those other versions to somewhere less confusing.


> I'll let you know about any other problems I encounter.

OK, thanks.  I can fold any changes into the next release.

I'll be at the meeting in Phoenix, so we can discuss any other issues
you have.


> Very nice work, in any case!

Thanks.  I was able to build the reference implementation from the spare
parts I had lying around for Charles, so it was a big job but not that big.

I was just thinking today that it would be nice to have a functional
insertion operation, like this:

--see wordcount.adb
declare
    N : Natural renames Insert (Map'Access, Word, 0).all;
begin
    N := N + 1;
end;

or like this:

--see genealogy.adb
declare
    Roots : Set_Type renames Insert (Map'Access, Key => "---").all;
begin
...

This simulates what I can do in C++ using operator[]().

One way to declare it is:

    function Insert
      (Map : access Map_Type;
       Key :        String) return access Element_Type;

I was thinking the cursor selectors could be declared like this:

    function To_Element
       (Cursor : Cursor_Type) return access Element_Type;

    function To_Key
      (Cursor : Cursor_Type) return access constant Key_Type;

If functions could return an anonymous access type this would allow me
to get rid of the Generic_Element and Generic_Key functions.

Just some ideas...

****************************************************************

From: Dan Eilers
Sent: Saturday, February 28, 2004  2:14 PM

  In ai302/test_sets.adb, on line 91, there is a call to
"find" that appears to be ambiguous, matching the find
declared in test_sets.adb on line 51, and the find
declared in integer_vectors.

****************************************************************

From: Adam Beneschan
Sent: Monday, March  1, 2004  6:33 PM

...
> fails with complaints about trying to add primitive operations
> after a type is frozen.  It is a bit subtle, but this type
> declaration is in fact implicitly declaring additional operations
> on "Control_Type" *after* Control_Type has been passed to a generic.

Can this be right?  Essentially the source is equivalent to:

   generic ...
   package Indefinite_Vectors is

   private

      type Control_Type is new Controlled with record ... end record;

      package Rep_Types is
         type Vector_Type is private;
         procedure Append (Vector   : in out Vector_Type;
                           New_Item : in     Control_Type);
      private ...
      end Rep_Types;

      type VT is new Rep_Types.Vector_Type;

   end Indefinite_Vectors;

The derived type declaration causes a new inherited subprogram to be
declared implicitly:

   procedure Append (Vector   : in out VT;
                     New_Item : in     Control_Type);

But as I read RM 3.2.3 and particularly 3.2.3(4), the derived
subprogram Append is a primitive subprogram of type VT, but *not* a
primitive subprogram of type Control_Type.  So there shouldn't be an
error message about primitive subprograms being added after
Control_Type is frozen (even if there were some declaration that froze
Control_Type before the declaration of VT, which there isn't in my
reduced example).

Also, 3.9.2(13) makes "the explicit declaration of a primitive
subprogram of a tagged type" illegal after the type is frozen, but
this is not an explicit subprogram declaration.

So what did I miss?

****************************************************************

From: Randy Brukardt
Sent: Monday, March  1, 2004  6:57 PM

...
> But as I read RM 3.2.3 and particularly 3.2.3(4), the derived
> subprogram Append is a primitive subprogram of type VT, but *not* a
> primitive subprogram of type Control_Type.

Humm. This looks messy. Primitive subprograms have to be explicitly declared
for initial types. But 3.2.3(4) says that inherited routines are primitive
for derived types. It doesn't say that routines inherited *from the parent
type* are primitive. In this case, Control_Type is derived, so inherited
routines are primitive -- and this routine is certainly inherited.

Of course, that seems to be a nonsense interpretation of the language. I
think that 3.2.3(4) was intended to apply only to routines inherited from
the parent. So the question is whether that can be derived from other
language (in which case Tucker's compiler has a bug), or if there is
actually a language hole.

****************************************************************

From: Adam Beneschan
Sent: Monday, March  1, 2004  7:26 PM

> Humm. This looks messy. Primitive subprograms have to be explicitly declared
> for initial types. But 3.2.3(4) says that inherited routines are primitive
> for derived types. It doesn't say that routines inherited *from the parent
> type* are primitive. In this case, Control_Type is derived, so inherited
> routines are primitive -- and this routine is certainly inherited.

The exact language of 3.2.3(2,4) is:

   The primitive subprograms of a specific type are defined as
   follows:

   For a derived type, the inherited (see 3.4) user-defined
   subprograms;

So we refer to 3.4 to see what it says about "inherited user-defined
subprograms".  3.4(17) says, "For each user-defined primitive
subprogram... of the parent type that already exists at the place of
the derived_type_definition, there exists a corresponding _inherited_
primitive subprogram of the derived type with the same defining name".
The primitive subprograms of the parent type that exist at the time
Control_Type is defined are those that exist for Control_Type's parent
type, Ada.Finalization.Controlled, namely Initialize, Finalize,
Adjust.

So to me, those are "the inherited user-defined subprograms" to which
3.2.3(4) refers.  I've always interpreted it that way, just from the
language of those two sections, independently of any other language in
the RM or of any conclusion that a different interpretation would be
nonsense.

> Of course, that seems to be a nonsense interpretation of the language. I
> think that 3.2.3(4) was intended to apply only to routines inherited from
> the parent.

I agree.  I personally think the intent is already clear from the RM.

****************************************************************

From: Randy Brukardt
Sent: Thursday, April 29, 2004  9:59 PM

I've just posted the updated Container library AI. [This is version /03.] This
was updated to reflect the conclusions of the six hours of discussion (which
was a record for a single AI) at the Phoenix meeting.

I'm happy to say that most of the suggestions made here were implemented in
some way. Indefinite element containers were added, as well as a list
container. Set operations were added to the set package. Iteration was
changed somewhat to be more familiar to Ada programmers. The operations and
their semantics were made more regular.

Comments are welcome. (But please remember that I have to read and file all
of them for the permanent record, so try to take the long-winded discussions
of philosophy to comp.lang.ada. :-)

****************************************************************

From: Pascal Obry
Sent: Friday, April 30, 2004  1:26 AM

That's great news ! Congratulations to all for the hard word on this issue.

****************************************************************

From: Marius Amado Alves
Sent: Friday, April 30, 2004  3:04 PM

> I've just posted the updated Container library AI...

Excelent!

Just a tiny comment at this time: the names Indefinite_Vectors, etc. do not
sound right to me, because the element type is indefinite, not the
containers. Alternatives:

1. Containers.Indefinite_Elements.Vectors
2. Containers.Vectors_Of_Indefinite_Elements
3. Containers_Of_Indefinite_Elements.Vectors

("Indefinite_Elements" is not literally correct either because the type, not
the elements, is indefinite. But it is a common idiom to say "things" in
place of "thing type".)

I think I like 3.

****************************************************************

From: Jean-Pierre Rosen
Sent: Friday, April 30, 2004  8:20 AM

Everybody talks about a real vector, or a complex matrix. Doesn't seem
to hurt the mathematicians...

****************************************************************

From: Marius Amado Alves
Sent: Friday, April 30, 2004  11:21 AM

Vector of real numbers... real vector.
Vector of elements of indefinite type... vectors of indefinite elements...
indefinite vector.
Ok, I think the ears will get accostumed.

****************************************************************

From: Jeffrey Carter
Sent: Friday, April 30, 2004  5:27 PM

> Comments are welcome. (But please remember that I have to read and file all
> of them for the permanent record, so try to take the long-winded discussions
> of philosophy to comp.lang.ada. :-)

Perhaps I'm missing something, but I don't see why the vector component
needs the assertion anymore. If it's not needed, it would be nice to
eliminate it.

****************************************************************

From: Dan Eilers
Sent: Friday, April 30, 2004  6:25 PM

  Some typos:

> All containers are non-limited, and hence allow ordinary assignment.  In
> the unique case of a vector, there is a separate assignment procedure:
>
>    Assert (Target => V1, Source => V2);
     ^^^^^^

> The reason is that the model for a vector is that it's implemented using
> an unconstrained array. During ordinary assignment, the internal array
> is deallocated (during controlled finalization), and then a new internal
> [array] is allocated (during controlled adjustment) to store a copy of the
  ^^^^^^^

"is may not"

hat the average bucket

caching *effects

arbitary
conbined
evalution
exmples
Generic_Revserse_Find
heirarchy
Indefinited_Hashed_Maps
insuffiently
machinary
simplied
stratgies
sucessful

****************************************************************

From: Christoph Grein
Sent: Thursday, May 6, 2004  4:07 AM

A few more typos:

specify precisely where this will happen (it will happen no lat{t}er than the
                                                               ^^^

AARM Note: Replace_Element, Generic_Update, and Generic_Update_by_Index are
[the] only ways that an element can change from empty to non-empty.
^^^^^

Any exceptions raising during element assignment
               raised (as everywhere else)

cursor designates with a[n] index value (or a cursor designating an element at
                        ^^^

declared in Containers.Vectors with a[n] ambiguous (but not invalid, see below)
                                     ^^^

but it is {is} a *different* element
          ^^^^
****************************************************************

From: Marius Amado Alves
Sent: Thursday, May  6, 2004  3:07 PM

What happened to the Lower_Bound, Upper_Bound and "insert with hint"
operations for sets? They were very useful. Is there a way to make the same
kind of searches/updates with the new spec?

Furthermore, often the user already has a cursor value for an element that he
knows is a bound for another search he wants to make. It should be possible
to use this information to improve the search.

Previous versions (e.g. 1.1) of the spec had an "insert with hint" operation
providing something similar, albeit more restrictive (the known cursor had to
be adjacent). The current version does not have even this.

/* I found these requirements in real world situations, namely writing a
database system that uses large sets to store some things. */

At least implementation permissions/advice should exist allowing/encouraging
implementations to provide these optimized search/update operations. Namely
via operations with the standard profiles except having additional "hint"
parameters. Better yet make a number of these optimized profiles standard,
permitting the actual optimization to be null. To assure portability.

****************************************************************

From: Matthew Heaney
Sent: Monday, May 10, 2004  6:48 PM

> What happened to the Lower_Bound, Upper_Bound and "insert with hint"
> operations for sets? They were very useful. Is there a way to make the same
> kind of searches/updates with the new spec?

I tried to keep them, but I argued badly and hence lost that vote.

I discussed restoring these operations (Lower_Bound and Upper_Bound)
with Randy, and he said I'd have post a message on ada-comment
justifying why those operations are needed.  (You have to do it that way
since the entire ARG voted in the last meeting, and you have give them
the opportunity to reconsider their decision during the next meeting.)

It's good that you're asking about this, since that's evidence that
there is interest in these operations from someone other than me.

It would be helpful if you could post a follow-up message on ada-comment
giving a specific example of why you need LB and UB.  The ARG can then
put this on the agenda for the ARG meeting in Palma.


> Furthermore, often the user already has a cursor value for an element that he
> knows is a bound for another search he wants to make. It should be possible
> to use this information to improve the search.

That's similar to insert-with-hint.  However, the ARG members weren't
persuaded by my defense of optimized insertion.


> Previous versions (e.g. 1.1) of the spec had an "insert with hint" operation
> providing something similar, albeit more restrictive (the known cursor had to
> be adjacent). The current version does not have even this.

Yes, that is correct.  Personally I can live without the
insert-with-hint operations (because you have insert-sans-hint), but I
think removing the Lower_Bound and Upper_Bound operations was a mistake,
since that leaves no way to find the set element nearest some key.  All
you have now is basically a membership test, which is too coarse a
granularity.

For example, someone on CLA had an ordered set of integers, and he
wanted to iterate over the values in [0, 1000), then from [1000, 2000),
etc.  Without Lower_Bound there's no way to do that.


> /* I found these requirements in real world situations, namely writing a
> database system that uses large sets to store some things. */

Please post an example of what you're trying to do, and show how it
can't be done without Lower_Bound and Upper_Bound.


> At least implementation permissions/advice should exist allowing/encouraging
> implementations to provide these optimized search/update operations. Namely
> via operations with the standard profiles except having additional "hint"
> parameters. Better yet make a number of these optimized profiles standard,
> permitting the actual optimization to be null. To assure portability.

Give an example of why you need Lower_Bound and Upper_Bound, and request
that the ARG put it on the agenda for Palma.

Some other possibilities are:

procedure Find
   (Container : in     Set;
    Key       : in     Key_Type;
    Position  :    out Cursor;
    Success   :    out Boolean);

If the Key matches, then Success=True and Position.Key = Key.
Otherwise, Success=False and Key < Position.Key.

Technically you don't need that, since you can test the result of
Lower_Bound:

    C : Cursor := Lower_Bound (Set, Key);
begin
    if Key < Position then
       null;  --Position denotes next (successor) neighbor
    else
       null;  --Position denotes node containing Key
    end;


Another possibility is to name it something like "Ceiling" or whatever.

An additional possibility is something (STL) like:

procedure Equal_Range
   (Container   : in     Set;
    Key         : in     Key;
    Lower_Bound :    out Cursor;
    Upper_Bound :    out Cursor);

Then you can test:

    Lower_Bound = Upper_Bound => key not found
    Lower_Bound /= Upper_Bound => found

This latter operation has the benefit of working with multisets too.

****************************************************************

From: Marius Amado Alves
Sent: Tuesday, May 11, 2004  6:55 AM

Upon Heaney's advice, I'll detail the case for optimized operations for
sets.

I use Lower_Bound in the implementation of Mneson, in at least four
subprograms, excerpted below. For the entire code see
www.liacc.up.pt/~maa/mneson. Mneson is a database system based on a directed
graph implemented as a set of links.

Link_Sets is an instantiation of AI302.Containers.Ordered_Sets for
Link_Type, which is an array (1 .. 2) of vertices. Link_Set and Inv_Link_Set
are Link_Sets.Set_Type objects. Links are ordered by the 1st component, then
by the 2nd. Front_Vertex is an unlinked vertex value lower that any other.

   procedure Delete_All_In_Range
     (Link_Set : in out Link_Sets.Set_Type; From, To : Link_Type)
   is
      use Link_Sets;
      I : Cursor_Type := Lower_Bound (Link_Set, From);
   begin
     while I /= Back (Link_Set) loop
       exit when Element (I) > To;
       Delete (Link_Set, I);
     end loop;
   end;

   procedure For_Each_Link_In_Range
     (Set : Link_Sets.Set_Type; From, To : Link_Type)
   is
      use Link_Sets;
      I : Cursor_Type := Lower_Bound (Set, From);
      E : Link_Type;
   begin
      I := Lower_Bound (Set, From);
      while I /= Back (Set) loop
         E := Element (I);
         exit when E > To;
         Process (E);
         Increment (I);
      end loop;
   end;

   function Connected (Source : Vertex) return Boolean is
      use Link_Sets;
   begin
      return
         Lower_Bound (Links, (Source, Front_Vertex)) /= Null_Cursor;
   end;

   function Inv_Connected (Target : Vertex) return Boolean is
      use Link_Sets;
   begin
      return
         Lower_Bound (Inv_Links, (Target, Front_Vertex)) /= Null_Cursor;
   end;

I'm also developing optimized algorithms for set intersection that require
not only Lower_Bound but also search with hint (known bounds), and
eventually Upper_Bound. These are still on the drawing board, but I already
know at this point that they require those operations. Soon I'll have some
code, but it's rather complicated, because Mneson sets can be of various
kinds, extensional and intensional, the basic extensional being a designated
vertex whose targets are the elements, the intensional being a dedicated
"selection" structure, designed for lazy evaluation, with elements being
represented in several ways, and materialized only upon certain operations
like iteration and extraction.

My interest is databases. At least here, ordered sets are an incredibly
useful thing. Pretty much every interesting database function can be defined
in terms of them. In a graph-based implementation like Mneson, set
intersection is crucial.

The spec now has the full set algebra (union, intersection, differences,
etc.) That is good, and if their performance were ideal for all purposes,
I'd be silent.

But I know their performance cannot be ideal in many situations, because I
know optimization techniques that require more than what the spec now
offers. Namely they require search with hint and/or Lower_Bound.

And anyway the spec does not specify performance for them (only for Insert,
Find, Element).

Also note that the Find operations for Vectors and Hashed_Maps are kind of
hintful, so it's only fair that Ordered_Sets have these versions too.

For databases, performance is paramount. Even apparently small gains matter.
"Apparently" because many database functions scale worse than lineary, e.g.
cross products. Optimization is these cases is a must. In many cases the
optimization makes all the difference (between feasible and unfeasible).

Optimization is invariably based on knowledge the system prepares about the
sets in the expression queried for computation. The preparation time is
usually negligible. In a system implemented with Ada.Containers, great part
of the prepared knowledge is ultimately expressed as cursor values for known
element value bounds for the sought element ranges.

Ordered_Sets implementations are likely to be able to take advantage of this
knowledge for improving time performance (the previous AI302 "insert with
hint" is an example).

Therefore it is required that this knowledge can be passed to the basic
operations.

Immodestely assuming I've made a convincing case, I can inform that Heaney
and myself have solid ideas on how the operations should look like and we
are ready to prepare a pretty open-shut proposal for Palma. I myself will be
there from Sunday to Sunday, and happily available for discussion. To me the
most promising format is *prescribing* hintful versions of Find et al. but
with only *advised* performance, i.e. allowing null optimization.

****************************************************************

From: Matthew Heaney
Sent: Tuesday, May 11, 2004  12:37 PM

> Link_Sets is an instantiation of AI302.Containers.Ordered_Sets for
> Link_Type, which is an array (1 .. 2) of vertices. Link_Set and Inv_Link_Set
> are Link_Sets.Set_Type objects. Links are ordered by the 1st component, then
> by the 2nd. Front_Vertex is an unlinked vertex value lower that any other.
>
>    procedure Delete_All_In_Range
>      (Link_Set : in out Link_Sets.Set_Type; From, To : Link_Type)
>    is
>       use Link_Sets;
>       I : Cursor_Type := Lower_Bound (Link_Set, From);
>    begin
>      while I /= Back (Link_Set) loop
>        exit when Element (I) > To;
>        Delete (Link_Set, I);
>      end loop;
>    end;

You might want to vet From and To, to assert that they're in order.  It
also looks like you mean to delete the node designated by To (this is
apparently a closed range), which means you could use Upper_Bound to
find the endpoint of the range:

   procedure Delete_All_In_Range
     (Link_Set : in out Link_Sets.Set_Type; From, To : Link_Type)
   is
      pragma Assert (From <= To);

      use Link_Sets;
      I : Cursor_Type := Lower_Bound (Link_Set, From);
      J : constant Cursor_Type := Upper_Bound (Link_Set, To);
   begin
     while I /= J loop
       Delete (Link_Set, I);
     end loop;
   end;


>    procedure For_Each_Link_In_Range
>      (Set : Link_Sets.Set_Type; From, To : Link_Type)
>    is
>       use Link_Sets;
>       I : Cursor_Type := Lower_Bound (Set, From);
>       E : Link_Type;
>    begin
>       I := Lower_Bound (Set, From);  --???
>       while I /= Back (Set) loop
>          E := Element (I);
>          exit when E > To;
>          Process (E);
>          Increment (I);
>       end loop;
>    end;

This again appears to be a closed range, so I recommend using
Upper_Bound to find the endpoint:

   procedure For_Each_Link_In_Range
     (Set : Link_Sets.Set_Type; From, To : Link_Type)
   is
      pragma Assert (From <= To);

      use Link_Sets;
      I : Cursor_Type := Lower_Bound (Set, From);
      J : constant Cursor_Type := Upper_Bound (Set, To);
   begin
      while I /= J loop
         Process (Element (I));
         Increment (I);
      end loop;
   end;

Alternatively, you could use the new Generic_Update procedure:

   procedure For_Each_Link_In_Range
     (Set : Link_Sets.Set_Type; From, To : Link_Type)
   is
      pragma Assert (From <= To);

      use Link_Sets;

      procedure Process (E : in out Link_Type) is
      begin
         ...; --whatever
      end;

      procedure Update is new Generic_Update;

      I : Cursor_Type := Lower_Bound (Set, From);
      J : constant Cursor_Type := Upper_Bound (Set, To);
   begin
      while I /= J loop
         Update (I);
         Increment (I);
      end loop;
   end;

(Note that I only have the vectors done in the reference implementation.)


>    function Connected (Source : Vertex) return Boolean is
>       use Link_Sets;
>    begin
>       return
>          Lower_Bound (Links, (Source, Front_Vertex)) /= Null_Cursor;
>    end;


Lower_Bound will only return Null_Cursor if the value is greater than
every element in the set.  So it looks like you're testing whether the
value is less than or equal to an element in the set.  There are
probably other ways to implement this predicate function, for example:

   function Connected (Source : Vertex) return Boolean is
      use Link_Sets;
   begin
      if Is_Empty (Source) then
          return False;
      end if;

      return Link_Type'(Source, Front_Vector) <= Last_Element (Links);
   end;


>    function Inv_Connected (Target : Vertex) return Boolean is
>       use Link_Sets;
>    begin
>       return
>          Lower_Bound (Inv_Links, (Target, Front_Vertex)) /= Null_Cursor;
>    end;

Ditto for this function.

The moral here is you don't need Lower_Bound if all you do is throw away
its result.

However, it looks like in the first two examples, you have a legitimate
need for Lower_Bound (and arguably Upper_Bound, too).

****************************************************************

From: Marius Amado Alves
Sent: Tuesday, May 11, 2004  1:20 PM

> ...
> >    function Connected (Source : Vertex) return Boolean is
> >       use Link_Sets;
> >    begin
> >       return
> >          Lower_Bound (Links, (Source, Front_Vertex)) /= Null_Cursor;
> >    end;
>
>
> Lower_Bound will only return Null_Cursor if the value is greater than
> every element in the set.

Oops, this was a bug. Thanks a lot for catching it. What I must have meant
is:

   X := Lower_Bound (Links, (Source, Front_Vertex));
   return X /= Null_Cursor and then Element (X) (1) = Source;

Thanks a lot for the other suggestions too. I won't be applying them yet
because if-it-works-dont-fix-it, but I've certainly queued them in the
Mneson "to do" list.

> ...it looks like in the first two examples, you have a legitimate
> need for Lower_Bound (and arguably Upper_Bound, too).

Yes. And these, unlike the specific version of Connect above, are used and
tested.

(It seems the specific version of Connect above had not been used yet. Which
accounts for it's fault not being detected. It's there in the library
because when I wrote the libary it looked like it would be necessary. Thanks
to you, if and when it does, it will be flawless. Thanks again.)

****************************************************************

From: Tucker Taft
Sent: Tuesday, May 11, 2004  2:17 PM

I think I missed the beginning of this discussion,
but I would agree with the suggestion for using
Floor and Ceiling rather than Lower_Bound and Upper_Bound,
to find the nearest element of the set no greater
(or no less, respectively) than a given value.
And I agree they would be useful operations on an ordered set.

Lower_Bound and Upper_Bound seem more likely to refer to the
minimum and maximum elements of the entire set.

****************************************************************

From: Marius Amado Alves
Sent: Tuesday, May 11, 2004  3:31 PM

> ... I would agree with the suggestion for using
> Floor and Ceiling...

Good. One of the proposals I'm discussing with Matt has indeed

   function Floor (Item : Element_Type) return Cursor;
   function Ceiling (Item : Element_Type) return Cursor;

where Ceiling = Lower_Bound, but Floor /= Upper_Bound, Floor =
Reverse_Lower_Bound.

(Here Lower_Bound and Upper_Bound are the functions defined in version 1.1
of the spec, and that were dropped in the current. Reverse_Lower_Bound is a
fictitious function like Lower_Bound but in reverse order.)

The proposal also has

   function Slice (Container : Set; Low, High : Cursor) return Set;
   function Open_Bound (Position : Cursor) return Cursor;

The four functions provide a complete search optimization framework. The
main idea is that a slice can be used to convey range and/or optimization
information to any operation.

Slice returns the subset of Set consisting of the elements of Set that are
in the specified interval.

Open_Bound returns a cursor marked as an open bound when used in Slice. A
unmarked cursor represents a closed bound.

The integer set example, namely to iterate over the values in [0, 1000),
then [1000, 2000), etc., becomes:

   procedure Iterate is new Generic_Iteration;
begin
   Iterate (Slice (Integer_Set, Floor (0), Open_Bound (Ceiling (1000))));
   Iterate (Slice (Integer_Set, Floor (1000), Open_Bound (Ceiling (2000))));

Also, with this framework, Upper_Bound (Set, Item) can be realised
functionally as:

First
  (Slice
     (Set,
      Open_Bound (Ceiling (Set, Item)),
      Last (Set)))

So no need for Upper_Bound.

The only sensitive aspect of this framework is the use of a slice as an
object of update operations (Insert, etc.) A slice is likely to be best
represented as a 'virtual' set, i.e. only a 'view' to the corresponding
subset of its 'ground' container. We are currently checking whether and how
and which update operations can process this virtual object properly.

****************************************************************

From: Matthew Heaney
Sent: Tuesday, May 11, 2004  5:13 PM

> Lower_Bound and Upper_Bound seem more likely to refer to the
> minimum and maximum elements of the entire set.

As Mario has pointed out, Ceiling is equivalent to Lower_Bound.

There is no function that corresponds to a Floor function in the STL,
Charles, or earlier releases of the AI-302 draft.  I did discuss how to
implement a floor function in the examples section of earlier drafts, as
follows:

    Floor (S, K) = Previous (Upper_Bound (S, K))

Here Floor is derived from Upper_Bound.  In the most recent draft, for
subtle reasons you have to implement Floor as:

    function Floor (S, K) return Cursor is
       C : Cursor := Upper_Bound (S, K);
    begin
       if C = No_Element then
          return Last (S);
       else
          return Previous (C);
       end if;
    end;

To derive Upper_Bound from Floor, I think it would be:

    function Upper_Bound (S, K) return Cursor is
       C : Cursor := Floor (S, K);
    begin
       if C = No_Element then
         return First (C);
       else
         return Next (C);
       end if;
    end;

To iterate over the half-open range [K1, K2), where K1 <= K2, I think
you would have to write:

    declare
       I : Cursor := Ceiling (S, K1);
       J : Cursor := Floor (S, K2);
    begin
       if J = No_Element then
          J := First (S);
       end if;

       while I /= J loop
           ...
           Next (I);
       end loop;
    end;

However, this seems a little awkward.  (Assuming my analysis is correct.
   I have to think about whether [K1, K2) is a closed range or a
half-open range.  Mario's example was a closed range.)

What we really need is something to compliment Ceiling, something like
"Strict_Ceiling" or "Proper_Ceiling", e.g.

    declare
       I : Cursor := Ceiling (S, K1);         -- K1 <= I.Key
       J : Cursor := Proper_Ceiling (S, K2);  -- K2 < J.Key
    begin
       while I /= J loop ...;
    end;

Is there a technical term for "proper ceiling"?  I want a function that,
given a key, returns the smallest key greater than the key.  (That's
what function Upper_Bound returns, but that name seems to be confusing
to people unfamiliar with the STL.)

****************************************************************

From: Marius Amado Alves
Sent: Tuesday, May 11, 2004  5:19 PM

Two corrections:


Slice returns the subset of *Container* consisting of the elements of
*Container* that are in the specified interval (not Set, that's the type).


Upper_Bound (S, Item) =
First
  (Slice
     (S,
      Open_Bound (*Floor* (S, Item)),
      Last (S)))

(S instead of Set because that's the type name, and Floor, not Ceiling)


Sorry.


BTW, Reverse_Upper_Bound (S, Item) =

Last
  (Slice
     (S,
      First (S),
      Open_Bound (Ceiling (S, Item)))).

Also,

   S = Slice (S, First (S), Last (S))

should always hold.

Currently thinking about the special cases, namely those with occurrences of
No_Element.

And about the slice-for-update problem: easily solved with a specification
similar to the current one for invalid cursors, given that a slice is
expressed as cursor values.

****************************************************************

From: Marius Amado Alves
Sent: Wednesday, May 12, 2004  3:15 AM

> Here Floor is derived from Upper_Bound.  In the most recent draft, for
> subtle reasons you have to implement Floor as:
>
>     function Floor (S, K) return Cursor is
>        C : Cursor := Upper_Bound (S, K);
>     begin...

But you don't have Upper_Bound in the most recent draft!

> What we really need is something to compliment Ceiling, something like
> "Strict_Ceiling" or "Proper_Ceiling", e.g.

I'm against strange things in the spec. Give the user only well known
concepts. A complete set of primitive well known concepts. Ceiling, Floor,
Slice, Open_Bound. Then he can derive whatever Strange_Ceiling he wants.

> Is there a technical term for "proper ceiling"?

Smallest_Greater_Than :-) But then to be complete you need also
Greatest_Smaller_Than. But then again, don't give strange things.

****************************************************************

From: Marius Amado Alves
Sent: Wednesday, May 12, 2004  3:44 AM

>    procedure Delete_All_In_Range
>      (Link_Set : in out Link_Sets.Set_Type; From, To : Link_Type)
>    is
>       pragma Assert (From <= To);
>
>       use Link_Sets;
>       I : Cursor_Type := Lower_Bound (Link_Set, From);
>       J : constant Cursor_Type := Upper_Bound (Link_Set, To);
>    begin
>      while I /= J loop
>        Delete (Link_Set, I);
>      end loop;
>    end;

My impression is that the original version is more efficient, because it
only calls a search function once (Lower_Bound). Your version makes two
calls (Lower_Bound, Upper_Bound). I assume these operations have O(log n)
time performance, and the others (Back, Element, Delete) constant time. But
my version calls these more times. So I'd have to check the absolute times.

This also provides an example for optimized search with Slice. Because I
know the upper bound must be above the lower, I could pass this information
to Upper_Bound:

   J : constant Cursor_Type := Upper_Bound (Slice (Link_Set, I, Last
(Link_Set)), To);

****************************************************************

From: Matthew Heaney
Sent: Wednesday, May 12, 2004  9:33 AM

>>What we really need is something to compliment Ceiling, something like
>>"Strict_Ceiling" or "Proper_Ceiling", e.g.
>
> I'm against strange things in the spec. Give the user only well known
> concepts. A complete set of primitive well known concepts. Ceiling, Floor,
> Slice, Open_Bound. Then he can derive whatever Strange_Ceiling he wants.

Does Upper_Bound qualify as a "well known concept"?  All I'm trying to
do is come up with another name for Upper_Bound.

>>Is there a technical term for "proper ceiling"?
>
> Smallest_Greater_Than :-) But then to be complete you need also
> Greatest_Smaller_Than. But then again, don't give strange things.

But Upper_Bound isn't a strange thing.

I suspect Stepanov was motivated by the set-theoretic terms "upper
bound," "least upper bound", etc.  But I think it's that conflation to
which Tucker objects.

Other names for Upper_Bound are: limit, supremum, supremum limit, etc.

procedure Op (K1, K2 : Key_Type) is
    I : Cursor := Ceiling (Set, K1);
    J : constant Cursor := Limit (Set, K2);
begin
    while I /= J loop ...;
end;

****************************************************************

From: Matthew Heaney
Sent: Wednesday, May 12, 2004  10:16 AM

> Lower_Bound and Upper_Bound seem more likely to refer to the
> minimum and maximum elements of the entire set.

One counter-argument is that both Lower_Bound and Upper_Bound accept a key.

Maybe we could provide these:

Lower_Limit
Floor
Ceiling   (AKA Lower_Bound)
Upper_Limit  (AKA Upper_Bound)

with the following semantics:

Key (Lower_Limit (S, K)) < K

Key (Floor (S, K)) <= K

Key (Ceiling (S, K)) >= K

Key (Upper_Limit (S, K)) > K

****************************************************************

From: Tucker Taft
Sent: Wednesday, May 12, 2004  11:41 AM

I don't find the names Lower_Limit and Upper_Limit
a whole lot better than Lower_Bound/Upper_Bound.

I don't see why you need them.  It seems
Lower_Limit(S,K) = Previous(Ceiling(S,K)) and
Upper_Limit(S,K) = Next(Floor(S,K))

Or am I confused?

****************************************************************

From: Matthew Heaney
Sent: Wednesday, May 12, 2004  1:46 PM

No, you got it right, except for the endpoints; see my last message.
For example, if Ceiling(S,K) returns No_Element (because K is large),
then Previous(Ceiling(S,K)) returns No_Element, whereas Lower_Limit
returns Last(S).

We can define the abstraction to have the semantics you describe above,
but I think that requires that (1) the set has an internal sentinel and
(2) type Cursor is privately tagged.

****************************************************************

From: Marius Amado Alves
Sent: Wednesday, May 12, 2004  10:36 AM

> Lower_Limit
> Floor
> Ceiling   (AKA Lower_Bound)
> Upper_Limit  (AKA Upper_Bound)

Better to keep a consistent metaphor: Ground, Floor, Ceiling, Roof. "Limit" is
too abstract.

Alternatives for Ground: Basement, Base... Underworld :-)

****************************************************************

From: Marius Amado Alves
Sent: Wednesday, May 12, 2004  11:47 AM

> Lower_Limit
> Floor
> Ceiling   (AKA Lower_Bound)
> Upper_Limit  (AKA Upper_Bound)

In mathematics "lower limit" applies to a sequence of values (e.g. the values
of sin (x) with x from zero to infinity), and means the least value of the
sequence. So it's really more similar to First.

[My main source for checking this stuff has been the Wikipedia
(en.wikipedia.org)]

Have you considered my Slice, Open_Bound proposal yet?

Recapitulating:

Ground, Floor, Ceiling, Roof, do not solve the problem of providing search
optimization information to the other operations.

Slice, Open_Bound do.

And Ground, Roof can be derived from Ceiling, Floor, Slice, Open_Bound, First,
Last.

So my proposal is adding Ceiling, Floor, Slice, Open_Bound.

And eventually Ground, Roof defined as "equivalent to..."

****************************************************************

From: Marius Amado Alves
Sent: Wednesday, May 12, 2004  8:22 AM

Connected *is* a legitimate example of the need for Lower_Bound.

The fixed Connected body is

      X : Cursor_Type := Lower_Bound (Links, (Source, Front_Vertex));
   begin
      return X /= Null_Cursor and then Element (X) (1) = Source;

Matt, your suggestion,

>    function Connected (Source : Vertex) return Boolean is
>       use Link_Sets;
>    begin
>       if Is_Empty (Source) then
>           return False;
>       end if;
>
>       return Link_Type'(Source, Front_Vector) <= Last_Element (Links);
>    end;

won't work. Apart from the obvious bugs Is_Empty (Source) which should be
Is_Empty (Links), and Front_Vector which should be Front_Vertex, the return
expression

   Link_Type'(Source, Front_Vertex) <= Last_Element (Links)

does not yield as desired. Front_Vertex is a value that is never connected
and is lower than any other. Let's say Front_Vertex = 0 and Links = ((2, 3),
(2, 4)). Then Connected (1) would (erroneously) return True, because (1, 0)
<= (2, 4). You're not checking for actual membership in Links. Maybe you had
something else in mind.

****************************************************************

From: Marius Amado Alves
Sent: Wednesday, May 12, 2004  10:30 AM

<<Does[n't] Upper_Bound qualify as a "well known concept"? >>

Not terribly, no.

<<All I'm trying to do is come up with another name for Upper_Bound....
I suspect Stepanov was motivated by the set-theoretic terms "upper
bound," "least upper bound", etc. ...
Other names for Upper_Bound are: limit, supremum, supremum limit, etc.>>

Actually the term "least upper bound" in mathematics is what we have been
calling *Lower_Bound*, or Ceiling. And "greatest lower bound" is Floor. I don't
know a mathematical term for what we have been calling Upper_Bound. Which to me
indicates a bit of strangeness.

Also, Upper_Bound (let's keep calling it that):
- does not seem to be so useful as Ceiling, if at all
- can be derived with First, Last, and Slice
My previous examples demonstrate this.

But the term you're looking for might be: Roof.

****************************************************************

From: Tucker Taft
Sent: Wednesday, May 12, 2004  12:40 PM

> Have you considered my Slice, Open_Bound proposal yet?

I don't see the need for "Slice" or "Open_Bound".
These seem to be introducing a layer of "virtual"
set on top, which you could do with a new abstraction.
Is there a real efficiency need here, or just a desire
for the additional abstraction level?

For example, it seems using an Open_Bound as the high
bound of an iteration is equivalent to iterating up to
Previous(Ceiling()).  You can easily create a "real"
slice by iterating from the low bound to the high
bound and insert the result in a new set.  If you want
a "virtual" slice, then to me that is an additional
layer on top, and not something appropriate for the
basic Ordered_Sets abstraction.


...
> So my proposal is adding Ceiling, Floor, Slice, Open_Bound.
>
> And eventually Ground, Roof defined as "equivalent to..."

I don't see the need to go beyond Floor and Ceiling.  They
seem to provide all the primitives needed to enable the
efficient construction of any operations you might want,
and I believe their meaning is more intuitive than the others
you have suggested.

****************************************************************

From: Matthew Heaney
Sent: Wednesday, May 12, 2004  1:25 PM

> For example, it seems using an Open_Bound as the high
> bound of an iteration is equivalent to iterating up to
> Previous(Ceiling()).

This requires care, since Ceiling can return No_Element if the key is
greater than every key in the set.  To make your algorithm fully general
I think you'd have to say:

declare
    C : Cursor := Ceiling (S, K);
begin
    if Has_Element then
       Previous (C);
    else
       C := Last (S);
    end if;
    ...
end;

> I don't see the need to go beyond Floor and Ceiling.  They
> seem to provide all the primitives needed to enable the
> efficient construction of any operations you might want,
> and I believe their meaning is more intuitive than the others
> you have suggested.

As above, the problem case is when Floor returns No_Element, because the
key is less than every key in the set.  To implement an equivalent of
Upper_Bound, it's not good enough to say Next (Floor (S, K)); you have
to say instead:

declare
    C : Cursor := Floor (S, K);
begin
    if Has_Element (C) then
       Next (C);
    else
       C := First (S);
    end if;
    ...
end;

I don't know whether this is really a problem, but I just wanted to
bring it up.  Having to handle the endpoints as a special case is a
consequence of the fact that we got rid of the internal sentinel node.

Another possibility is to restore the sentinel, and then define rules
for how it compares to the deferred constant No_Element.  Assuming type
Cursor is defined as:

   type Node_Type is record  -- red-black tree node
      Color : Color_Type;
      ...
   end record;

   type Cursor is record
      Node : Node_Access;
   end record;

   No_Element : constant Cursor := (Node => null);

function Has_Element (C : Cursor) return Boolean is
begin
    if C.Node = null then
      return False;
    end if;

    if C.Node.Color = White then -- sentinel has special color
       return False;
    end if;

    return True;
end;

function "=" (L, R : Cursor) return Boolean is
begin
    if L.Node = null
      or else L.Node.Color = White
    then
       return R.Node = null or else R.Node.Color = White;
    end if;

    if R.Node = null
      or else R.Node.Color = White
    then
       return False;
    end if;

    return L.Node = R.Node;
end;

The problem of course is that "=" for type Cursor overrides predefined
"=", which means predefined "=" re-emerges when type Cursor is a record
or array component, or when type Cursor is a generic actual type.

I suppose we could privately tag type Cursor, to guarantee that
predefined "=" never re-emerges.  I was trying to avoid that, however.

****************************************************************

From: Marius Amado Alves
Sent: Wednesday, May 12, 2004  1:47 PM

"Lower_Limit(S,K) = Previous(Ceiling(S,K))"

You mean

   Lower_Limit (S, K) = Previous (Floor (S, K)).

But this fails when Floor (S, K) < K.

****************************************************************

From: Matthew Heaney
Sent: Wednesday, May 12, 2004  1:56 PM

No.  Tucker was correct.

****************************************************************

From: Tucker Taft
Sent: Wednesday, May 12, 2004  2:21 PM

No, I meant what I wrote, based on Matt's specification
that Key(Lower_Limit(S,K)) < K.  I'm not sure you
and Matt have the same definition in mind for all
these functions.  In particularly I get the sense
that your definition of Lower_Bound is the opposite
of his.  I understand the notion of Greatest_Lower_Bound
on a lattice, but I have never quite understand how
that relates to Lower_Bound.

In any case, I was focusing on the specifications
that Matt gave for Lower_Limit and Upper_Limit,
and based my equivalence on those.

And I realize my equivalence fails at the end points,
but I suspect that some special handling may be required
for those in any case, and it is easy enough for the
user to define a function that does what is desired
(e.g. Previous_Or_Last() which returns Last when given
No_Element).

> But this fails when Floor (S, K) < K.

That's why I wrote Previous(Ceiling(S,K)).

****************************************************************

From: Marius Amado Alves
Sent: Thursday, May 13, 2004  4:30 PM

Oops, sorry. *I* was confused.

By the way, I checked the names so far, and they are (aligned, but in no
specific order):

Version 1.1  Mathematics           My names  Matthew's    AKAs, other
----------------------------------------------------------------------------
Lower_Bound  least upper bound     Ceiling   Ceiling
Upper_Bound                        Roof      Upper_Limit
             greatest lower bound  Floor     Floor
Reverse_Lower_Bound
                                   Ground    Lower_Limit
Reverse_Upper_Bound
             lower limit                                  First
             upper limit                                  Last
----------------------------------------------------------------------------
****************************************************************

From: Matthew Heaney
Sent: Wednesday, May 12, 2004  2:18 PM

> I don't see why you need them.  It seems
> Lower_Limit(S,K) = Previous(Ceiling(S,K)) and
> Upper_Limit(S,K) = Next(Floor(S,K))

Thinking about this issue some more, there might be a way to create
these semantics without a sentinel.  If a cursor is implemented this way:

   type Cursor is record
      Container : Set_Access;
      Node      : Node_Access;
   end record;

In which case you could implement Previous as:

function Previous (C : Cursor) return Cursor is
begin
    if C.Container = null then  --No_Element
       return C;
    end if;

    if C.Node = null then  --pseudo-sentinel
      return C;  --or: Last (C.Container)
    end if;

    if C = First (C.Container) then
       return (C.Container, null);  --pseudo-sentinel
    end if;

    return Previous (C.Container.Tree, C.Node);
end;

Next would be implemented similarly.

The only issue here is that Previous (First (S)) /= No_Element (the LHS
has a non-null set pointer, the RHS has a null set pointer).  I don't
know if this is an issue.

****************************************************************

From: Tucker Taft
Sent: Wednesday, May 12, 2004  2:30 PM

I don't think we need to change
"Previous" to make these equivalences work for
endpoints.   Just let the user write a
"Previous_Or_Last" if they really want to,
which would need to take both a cursor and a set.
Or more directly, write Lower_Limit or Upper_Limit
if you want them, since these already have enough
information with the set and the key.

Providing Ceiling and Floor still seems adequate to me,
as they provide the needed primitives for all other
operations mentioned thus far.

****************************************************************

From: Matthew Heaney
Sent: Wednesday, May 12, 2004  2:36 PM

OK.  That seems reasonable.  I just wanted to make sure we were on the
same page w.r.t the behavior at the endpoints.

****************************************************************

From: Marius Amado Alves
Sent: Wednesday, May 12, 2004  1:28 PM

<<I don't see the need for "Slice" or "Open_Bound".
These seem to be introducing a layer of "virtual"
set on top, which you could do with a new abstraction.
Is there a real efficiency need here, or just a desire
for the additional abstraction level?>>

Efficiency. Slice is a simple way of passing known bounds to *any* operation.

As an example consider the usual scenario from accounting where you have
invoices, and each invoice has a variable number of items. The relational
representation of this database includes a set Items of (Invoice_Id, Item_Id)
pairs, ordered by (Invoice_Id, Item_Id). You want to insert a new invoice X
with items A and B. Without Slice you do:

   Insert (Items, (X, A), Point_XA, Ok);
   Insert (Items, (X, B), Point_XB, Ok);

Each time Insert will have to search for the insertion point from the start
(e.g. from the root of a binary tree). But clearly Point_XA is close to
Point_XB, so if there was a way of telling Insert that we are inserting (X, B)
next to Point_XA, Insert could start looking from there to great advantage.
Slice provides that way.

   Insert
     (Slice (Items, Point_XA, Last (Items)),
      (X, B), Ok);

You could even save some extra micro-seconds writing:

   Insert
     (Slice (Items, Open_Bound (Point_XA), Last (Items)),
      (X, B), Ok);

[Of course there are other ways, not relational, of representing the data. For
example, Items could be a set of pairs (Invoice_Id, Item_Set), where Item_Set
is a set of items. But there are a number of reasons why you might want the
relational scheme. One is that with this scheme you can search Items by
properties of Item_Id. For example you might want to know which invoices sold
part number 12345. One more subtle reason--not applicable to this example, but
occuring in other common situations--has to do with the unfortunate fact that
it is not possible to have recursive containers without resorting to a pointer
idiom. There are other reasons.]

****************************************************************

From: Tucker Taft
Sent: Wednesday, May 12, 2004  3:07 AM

> Efficiency. Slice is a simple way of passing known
> bounds to *any* operation....

If I understand you, Slice is not a copy, but a by-reference
subset of a set, created for the purpose of improving performance.
I don't find this example sufficiently compelling to include it
in a basic capability like Ordered_Sets.  It requires significant
set up by the user, and it seems possible that in some implementations,
it would be a waste of energy.

I like "Ceiling" and "Floor" because they address the common
notion of "nearest" element or approximate match, something
which makes sense to ask in a set.  Slice and Open_Bound
seem to only serve some more obscure performance concern,
which I don't see of being of wide or general usefulness.

All these things involve subtle tradeoffs, and I accept you
might make different choices, but we are looking to provide
the 20% of all possible set operations that together meet
the needs of 80% of the typical users of sets.

****************************************************************

From: Marius Amado Alves
Sent: Thursday, May 13, 2004  6:34 AM

> > Efficiency. Slice is a simple way of passing known
> > bounds to *any* operation....
>
> If I understand you, Slice is not a copy, but a by-reference
> subset of a set, created for the purpose of improving performance.

Exactly. It *must not* be a copy.

> I don't find this example sufficiently compelling to include it
> in a basic capability like Ordered_Sets.  It requires significant
> set up by the user, and it seems possible that in some implementations,
> it would be a waste of energy.

The setup is not significant because the user can always ignore the slice
idiom, and/or only use it when the known bounds have been acquired naturally
from previous operations required by the application logic, as was the case
in the invoices example.

The implementation is easy, especially if null optimization is allowed, as I
proposed (a slice obviously knows about its base container, so an
unoptimized operation can just call itself with the base). But in most
implementations, namely using trees or skip lists, the implementation of
non-null optimization is also easy, because usually the internal search
primitives are recursive operations accepting bounds expressed as a node or
nodes, and the Cursor type is likely to have node information, as in the
previous Matt's study. So not a waste of energy. The implementation is
already there.

And here's one more real life example. Website access analysis. You want to
identify sessions from a HTTP requests log file. A session is a sequence of
requests from the same IP such that the time between each consecutive
request does not exceed 30 minutes (this is a common criteria). You want to
update each request with the corresponding (computed) session id. You key
the access log by (IP, Time), and traverse the entire file to effect this
logic. You will have naturally collected bounds for the fine search and
update operations. Rather strict bounds, giving you (in an non-null
optimization implementation) orders of magnitude gains in time. The usual
application of website analysis is for huge log files, of tens of million
accesses. The gains could mean a difference from hours to minutes or
seconds. I've done this stuff using databases systems (Postgres, MySQL), and
the scripts ran four hours. I wasn't able to optimized more because of the
same reasons we're discussing here: lack of ways to pass known bounds to the
core data engine. I've done this kind of work in several real life
application, including
http://soleunet.ijs.si/website/other/final_report/html/WP5-s9.html

Note that with the increasing availability of large RAM, the tendency is
towards *prevalent* systems, where all data required for search and
retrieval is held in RAM during work. An optimized Ada.Containers library
could mean a great plus for Ada in this area. Databases have been identified
as a promising area for Ada. Perhaps the DADAISM project had not stalled if
there were optimized Ada.Containers around then.

Open_Bound is not strictly required for optimization, but together with
Slice it provides a means to express any kind of interval.

I understand the 20/80 rule. It's just that in my perception the addition of
Slice configures 20.5/90 or so. Say 21/95 further adding Open_Bound.

****************************************************************

From: Marius Amado Alves
Sent: Thursday, May 13, 2004  7:36 AM

Note that Slice is also useful for non-optimization purposes. For example,
currently to process a "range" you must use the "active" iterator idiom:

      I : Cursor := From;
   begin
      while I <= To loop
         Process (I);
         Next (I);
      end loop;

With Slice you have access to the "passive" idiom right out of the box:

      procedure Iterate is new Generic_Iteration;
   begin
      Iterate (Slice (S, From, To));

****************************************************************

From: Marius Amado Alves
Sent: Thursday, May 13, 2004  8:08 AM

> ...you could use the new Generic_Update procedure:
>
>    procedure For_Each_Link_In_Range
>      (Set : Link_Sets.Set_Type; From, To : Link_Type)
>    is
>       pragma Assert (From <= To);
>
>       use Link_Sets;
>
>       procedure Process (E : in out Link_Type) is
>       begin
>          ...; --whatever
>       end;
>
>       procedure Update is new Generic_Update;
>
>       I : Cursor_Type := Lower_Bound (Set, From);
>       J : constant Cursor_Type := Upper_Bound (Set, To);
>    begin
>       while I /= J loop
>          Update (I);
>          Increment (I);
>       end loop;
>    end;

Generic_Update is excellent stuff. It does not apply in this particular case
though, because Mneson links are immutable by design (can only be created or
deleted, never changed). But there are a lot of element update situations in
other applications, and so having Generic_Update is a great improvement from
version 1.1 of the spec and corresponding reference implementation (that is
the one currently used by Mneson, and that does not have
Generic_Update--good thing that Mneson does not need them :-).

****************************************************************

From: Matthew Heaney
Sent: Thursday, May 13, 2004  10:24 AM

This will have to written as

    I : Cursor := Ceiling (Set, From);
    J : Cursor := Floor (Set, To);
begin
    if J = No_Element then  --To is small key
       pragma Assert (I = First (Set));
       return;
    end if;

    Next (J);  --now has value of Upper_Bound

    while I /= J then
       Update (I);
       Next (I):
    end loop;
end;

> Generic_Update is excellent stuff. It does not apply in this particular case
> though, because Mneson links are immutable by design (can only be created or
> deleted, never changed). But there are a lot of element update situations in
> other applications, and so having Generic_Update is a great improvement from
> version 1.1 of the spec and corresponding reference implementation (that is
> the one currently used by Mneson, and that does not have
> Generic_Update--good thing that Mneson does not need them :-).

Generic_Update is equivalent to Generic_Element.  The only difference is
that Generic_Update doesn't require the element to be aliased.  It
provides no new functionality relative to the 1.1 spec.

****************************************************************

From: Marius Amado Alves
Sent: Thursday, May 13, 2004  12:13 PM

I see. I didn't need it for Mneson so it was there in the book but not in my
mind.

Anyway Generic_Update is better because it's pointerless :-)

****************************************************************

From: Tucker Taft
Sent: Thursday, May 13, 2004  10:33 AM

I guess I am still not convinced.  If you use a binary
tree, having a cursor pointing into the tree is not
always terribly useful when you are trying to search
for some subsequent element with a given key.  You will
often have to go back "up" several levels before being
able to go back down.  With "Slice" you are forcing
every operation to support a "virtual" subset as well
as a real set.  This is going to inevitably introduce
some distributed overhead. I would be surprised if on
balance, this is a net savings.  I'm sure you could
construct a case where it would be a savings, but overall,
I would expect the mix of uses would favor keeping the
abstraction simpler.

An alternative is to have additional versions of operations
like "Find" and "Delete" which take a "Starting_With" cursor
parameter.  (This may be something that was there to begin
with, I have forgotten.)  Those might be useful, but still
they seem like operations that might sometimes be slower
than starting at the "top" of the binary tree, depending
on exactly where in the tree the Starting_With cursor points.

The added complexity to the interface just doesn't seem worth it.

There is certainly nothing preventing someone defining a
"Very_Ordered_Set" or whatever that has more of these operations,
making it closer to a "Vector" in interface.  Or it could be
a generic child of Ordered_Set.  I just don't
think the justification is there for our initial attempt at
a standard container library to include these additional
capabilities.

****************************************************************

From: Matthew Heaney
Sent: Thursday, May 13, 2004  10:59 AM

> An alternative is to have additional versions of operations
> like "Find" and "Delete" which take a "Starting_With" cursor
> parameter.  (This may be something that was there to begin
> with, I have forgotten.)  Those might be useful, but still
> they seem like operations that might sometimes be slower
> than starting at the "top" of the binary tree, depending
> on exactly where in the tree the Starting_With cursor points.

That's pretty much my feeling.  It's hard to know apriori whether it's
faster to find an item by starting at the top and then searching the
tree, or starting from some point in the tree and then searching linearly.

Starting from the top does have the benefit that we can definitely say
that the time complexity is O(log n) even in the worst case, which is
why I re-wrote Mario's example to use a top-down search.

I agree that Slice, Open_Range, etc, aren't necessary.  However, you
could make an argument for including an Upper_Bound style function,
since it's more efficient than the expression Next(Floor(S,K)), and
because it handles the endpoint issue automatically.

In fact I think the issue of endpoint is the more compelling argument
for including Upper_Bound (with some other name, of course), since even
trying to write Mario's example sans Upper_Bound required a bit of
mental effort.  Maybe call it Next_Ceiling or Upper_Ceiling or whatever.

****************************************************************

From: Marius Amado Alves
Sent: Thursday, May 13, 2004  12:48 PM

Slice does not complicate the abstraction, on the contrary, cf. my example
about iterating a range.

I agree tree implementations might have trouble optimizing certain cases.
But in those cases they can just start with at the root as for a no-slice.
But, yes, there is a slight overhead even then, namely for detecting the
kind of case.

Skiplist and hashtable implementations might do better though.

But remember it's not just about optimization, it's also about expressing
ranges declaratively.

Just some final thoughts. By now I think I've made the case for Slice.
Personally as a user I'd like it there. But I might be too biased a user
(towards databases). I'm all confident you'll make the right choice. And as
you point out there is always space for an Ada.Containers.Optimized_Sets
package (*), which can mean business for independent Ada tool developers :-)

(*) Is there? I was under the impression that the RM ruled-out extensions to
package Ada. But the Ada.Containers spec talks about them as if they were
legal. Sorry for the newby question.

****************************************************************

From: Martin Dowie
Sent: Thursday, May 13, 2004  2:08 PM

> (*) Is there? I was under the impression that the RM ruled-out extensions
to
> package Ada. But the Ada.Containers spec talks about them as if they were
> legal. Sorry for the newby question.

I believe the rule is you can't add Child packages to package "Ada" but you
can add Grand-Child and extend existing Child packages.

****************************************************************

From: Marius Amado Alves
Sent: Thursday, May 13, 2004  1:25 PM

(Damn I said I was done but you keep asking for it :-)

> > An alternative is to have additional versions of operations
> > like "Find" and "Delete" which take a "Starting_With" cursor
> > parameter.

I fail to see how duplicating Insert, Delete, Is_In, Find, complicate the
interface less than simply adding Slice.

> > (This may be something that was there to begin
> > with, I have forgotten.)

There was, but only for Insert, and the known position add to be adjacent to
the new.

> > Those might be useful, but still
> > they seem like operations that might sometimes be slower
> > than starting at the "top" of the binary tree, depending
> > on exactly where in the tree the Starting_With cursor points.
>
> That's pretty much my feeling.  It's hard to know apriori whether it's
> faster to find an item by starting at the top and then searching the
> tree, or starting from some point in the tree and then searching linearly.

This happens for either Slice or Starting_With. Actually Slice has more
information (the upper bound), which can help make a better decision.

> Starting from the top does have the benefit that we can definitely say
> that the time complexity is O(log n) even in the worst case, which is
> why I re-wrote Mario's example to use a top-down search.

Can you pinpoint please? (Not pressing.)

> I agree that Slice, Open_Range, etc, aren't necessary.  However, you
> could make an argument for including an Upper_Bound style function,
> since it's more efficient than the expression Next(Floor(S,K)), and
> because it handles the endpoint issue automatically.
>
> In fact I think the issue of endpoint is the more compelling argument
> for including Upper_Bound (with some other name, of course), since even
> trying to write Mario's example sans Upper_Bound required a bit of
> mental effort.

Again, can you pinpoint please? (Not pressing.)

>  Maybe call it Next_Ceiling or Upper_Ceiling or whatever.

I take it you don't like Roof :-(

****************************************************************

From: Randy Brukardt
Sent: Thursday, May 13, 2004  11:45 AM

> I guess I am still not convinced.  If you use a binary
> tree, having a cursor pointing into the tree is not
> always terribly useful when you are trying to search
> for some subsequent element with a given key.  You will
> often have to go back "up" several levels before being
> able to go back down.  With "Slice" you are forcing
> every operation to support a "virtual" subset as well
> as a real set.  This is going to inevitably introduce
> some distributed overhead. I would be surprised if on
> balance, this is a net savings.  I'm sure you could
> construct a case where it would be a savings, but overall,
> I would expect the mix of uses would favor keeping the
> abstraction simpler.

I totally agree. Moreover, there is overhead from requiring every
implementation of Sets to support by-reference, not copied set objects.
(That is, the result of Slice). Moreover, you're introducing even more
erroneous cases into the library.

Matt will be happy to tell you how hard I tried to eliminate *all*
erroneousness from the containers library. He eventually convinced me that
some cases of dangling cursors cannot be detected (that is, those that point
into container objects that no longer exist). So some erroneousness is
inevitable; but I'm very opposed to having it where it is not required.

(Note that the erroneous cases come from the non-OOP design of the library.
If the container object was a parameter to all operations [as it ought to
be, IMHO], then there would be no need for erroneous cases. But that's water
under the dam. :-)

****************************************************************

From: Nick Roberts
Sent: Friday, May 14, 2004  2:43 PM

I am generally delighted by this amendment, and I hope it goes in. I think
it shows how the knocking together of many wise heads generally produces a
good result (even if it is only after an awful lot of argument :-)

It does seem clear to me that a comprehensive set of packages could easily
have numbered in the hundreds, when one considers the combinations of
different structures and the selection between bounded and unbounded,
definite and indefinite, and so on. I haven't counted, but Booch is over a
hundred isn't it?

I have a few queries. My profuse apologies if any of these have already been
addressed (and I've missed them).

[1] The vectors and maps are intended to automatically expand when required.
This is fine, but the interface seems to provide no control over this
expansion at all. Would it perhaps be a good idea to add a generic parameter
such as below?

   Expansion_Size: Size_Type := [implementation defined];

The idea is that automatic expansion is done in multiples of Expansion_Size.
It has a default value, so that it can be conveniently ignored by the user.
A possible alternative is:

   Expansion_Factor: Float := [implementation defined];

The idea here is that automatic expansion of a map or vector X is by
Size_Type(Expansion_Factor*Float(Size(X))). Again there is a convenient
default.

Alternatively, ExpansionSize/Factor could be made a visible discriminant of
the container types, or an invisible attribute (with appropriate get and set
operations).

[2] What was the reason for not permitting Resize to make a container
smaller, please?

[3] I'd quite like the amendment to add a paragraph near the top clarifying
the idea that every container has a set of 'slots', and that each slot can
be either empty or contain the (valid?) value of one element. The following
descriptions could, I think, be made slightly clearer and more succinct by
referring to these slots. (Would you like specific wording?)

[4] Regarding the optimisation of operations, I suggest it may be possible
for an implementation  to keep enough extra internal information (in a Set
object) to enable it to detect and optimise various scenarios (judged to be
typical).

For example, assuming a tree structure, a pointer to the node above the
(terminal) node most recently inserted could be retained; the implementation
could test each insertion to see if it falls under this node; if a sequence
of insertions of (as it turns out) adjacent values occurs, this trick could
yield a very good speed improvement.

[5] Probably already mentioned, but in line 3364 'Assert (Target => V1,
Source => V2);' should be 'Assign (Target => V1, Source => V2);'.

Finally, is there a sample implementation of any these packages yet?

****************************************************************

From: Matthew Heaney
Sent: Thursday, May 13, 2004  3:05 PM

Nick Roberts wrote:
> I am generally delighted by this amendment, and I hope it goes in. I think
> it shows how the knocking together of many wise heads generally produces a
> good result (even if it is only after an awful lot of argument :-)

Most of the argument you didn't even see...

> It does seem clear to me that a comprehensive set of packages could easily
> have numbered in the hundreds, when one considers the combinations of
> different structures and the selection between bounded and unbounded,
> definite and indefinite, and so on. I haven't counted, but Booch is over a
> hundred isn't it?

Booch is large.  But my original AI-302 proposal was large too: I think
were something like 25 containers (some of them had bounded and
unbounded forms, etc), and the proposal itself was about 150 pgs.

> I have a few queries. My profuse apologies if any of these have already been
> addressed (and I've missed them).
>
> [1] The vectors and maps are intended to automatically expand when required.

Yes.

> This is fine, but the interface seems to provide no control over this
> expansion at all.

No.  That's what Resize is for.

> Would it perhaps be a good idea to add a generic parameter
> such as below?
>
>    Expansion_Size: Size_Type := [implementation defined];

Use Resize.

> The idea is that automatic expansion is done in multiples of Expansion_Size.
> It has a default value, so that it can be conveniently ignored by the user.
> A possible alternative is:
>
>    Expansion_Factor: Float := [implementation defined];

Use Resize to supply a hint about intended maximum length.  The
implementation then resizes the container according to the algorithm the
vendor has chosen.

> The idea here is that automatic expansion of a map or vector X is by
> Size_Type(Expansion_Factor*Float(Size(X))). Again there is a convenient
> default.

In the AI-302 reference implementation, the array is automatically
expanded to twice its current size.

> Alternatively, ExpansionSize/Factor could be made a visible discriminant of
> the container types, or an invisible attribute (with appropriate get and set
> operations).

The container types do not have discriminants.

> [2] What was the reason for not permitting Resize to make a container
> smaller, please?

Make a copy of the container, Clear the original, and then Move the copy
to the original.   (Wasn't this in the examples section?)

> [4] Regarding the optimisation of operations, I suggest it may be possible
> for an implementation  to keep enough extra internal information (in a Set
> object) to enable it to detect and optimise various scenarios (judged to be
> typical).
>
> For example, assuming a tree structure, a pointer to the node above the
> (terminal) node most recently inserted could be retained; the implementation
> could test each insertion to see if it falls under this node; if a sequence
> of insertions of (as it turns out) adjacent values occurs, this trick could
> yield a very good speed improvement.

Earlier releases of the AI-302 draft had overloadings of Insert that had
a hint parameter, which, if it were successfully used to perform the
insertion, then the time complexity would be O(1) instead of O(log n).

However, the insert-with-hint operations were removed from the API at
the ARG meeting in Phoenix.

> Finally, is there a sample implementation of any these packages yet?

<http://charles.tigris.org/>

See the ai302 subdirectory.

The vector containers in the ai302 subdirectory conform to the most
recent AI-302 draft (dated 2004/04/29).  Look for updates to the
remaining containers this weekend.  (I recommend simply joining the
charles project mailing lists, so you get notified automatically.)

****************************************************************

From: Randy Brukardt
Sent: Friday, May 14, 2004  10:09 PM

> [1] The vectors and maps are intended to automatically expand when required.
> This is fine, but the interface seems to provide no control over this
> expansion at all.

That's intentional. The implementation is allowed to choose the expansion
algorithm that makes the most sense for it's architecture. Resize can be
used to tell the implementation the ultimate size; there is an AARM note to
mention to implementors that it is intended that this do the allocations
needed. Matt claims that Resize often can be used in practice (I'm
skeptical), but when it can't be used, you really don't have enough
information to choose at all.

> [2] What was the reason for not permitting Resize to make a container
> smaller, please?

The same reason that deleting an element doesn't necessarily destroy the
element. We wanted to give the implementation flexibility in using blocking,
caching, etc. The only operation that is guaranteed to recover space is the
destruction of the container.

Matt shows that it can be done by jumping through hoops, so there is a way
to do it in the rare case that it is needed.

> [3] I'd quite like the amendment to add a paragraph near the top clarifying
> the idea that every container has a set of 'slots', and that each slot can
> be either empty or contain the (valid?) value of one element. The following
> descriptions could, I think, be made slightly clearer and more succinct by
> referring to these slots. (Would you like specific wording?)

It's not necessary, and makes things read more like a description of a
specific implementation. We want as abstract a description as possible. We
spent quite a bit of effort getting rid of such wording from the vector and
maps containers (there should be no further reference to "nodes" in those
containers). I would have done the same to the other containers if I would
have had more time and energy.

> [5] Probably already mentioned, but in line 3364 'Assert (Target => V1,
> Source => V2);' should be 'Assign (Target => V1, Source => V2);'.

Yes, and I've fixed all of the typos noted by Dan and Christoph in the
working version -- so the ARG won't need to consider them in Palma.

****************************************************************

From: Matthew Heaney
Sent: Friday, May 14, 2004  11:25 PM

> Matt shows that it can be done by jumping through hoops, so
> there is a way to do it in the rare case that it is needed.

Just to add to what Randy said: the point of Resize is to prevent
automatic expansion that would otherwise occur as items are inserted
into the container.  It's not influencing the size that's important per
se; rather, it's disabling expansion.

If you ever need to shrink a vector (say), then just do this:

Shrink:
declare
   Temp : Vector := V;
begin
   Clear (V);
   Move (Target => V, Source => Temp);
end Shrink;

Note that I've been an STL user for 4 years now, and I've never actually
had a need to shrink a vector.  Most of the time I use a vector to store
a large index or whatever, and usually I can determine prior to
insertion how many items I'm going to insert, so I call Resize first.

****************************************************************

From: Randy Brukardt
Sent: Friday, May 14, 2004  11:24 PM

I think you meant "Assign" rather than "Move", as Move just copies the existing
internal contents (thus preserving the size). "Assign" would make the target
only as large as necessary.

****************************************************************

From: Matthew Heaney
Sent: Saturday, May 15, 2004  1:54 AM

No, you've got it backwards.  Move does indeed preserve the size -- of
the source.  Here, Temp has the minimum size necessary to store the
Length (V) elements of V (although the API doesn't actually specify
this).

Note that Move doesn't copy any elements.  The copying happened during
assignment of V to Temp.

Assign copies the active elements Source of onto the existing internal
array of Target, so it doesn't modify the size unless Length (Source) >
Size (Target).

****************************************************************

From: Nick Roberts
Sent: Saturday, May 15, 2004  8:39 AM

> > [2] What was the reason for not permitting Resize to make a container
> > smaller, please?
>
> The same reason that deleting an element doesn't necessarily destroy the
> element. We wanted to give the implementation flexibility in using blocking,
> caching, etc. The only operation that is guaranteed to recover space is the
> destruction of the container.

Well, it may seem like nitpicking, but that seems to be a reason to /allow/
the implementation /not/ to (actually) shrink a container. It doesn't seem
like a reason to /disallow/ the implementation from shrinking it. Surely
allowing an implementation to shrink if it wishes would be provide the
greatest flexibility?

I suspect, with respect, that you are being a bit hopeful if you expect
implementations to use blocking, caching, or other optimisations. I doubt
that many will, in practice. And with an implementation close to the model,
there would be no difficulty in shrinking (by reallocation and copying, as
for enlargement). Actually, I think shrinking would probably be feasible for
most implementations, maybe all.

Again, I guess that's arguing the case as strongly as it can be.

> > [3] I'd quite like the amendment to add a paragraph near the top
> > clarifying the idea that every container has a set of 'slots', and that
> > each slot can be either empty or contain the (valid?) value of one
> > element. The following descriptions could, I think, be made slightly
> > clearer and more succinct by referring to these slots. (Would you
> > like specific wording?)
>
> It's not necessary, and makes things read more like a description of a
> specific implementation. We want as abstract a description as possible. We
> spent quite a bit of effort getting rid of such wording from the vector and
> maps containers (there should be no further reference to "nodes" in those
> containers). I would have done the same to the other containers if I would
> have had more time and energy.

Hmm. Well, I intended the 'slot' to be an abstract (model) concept, and you
could even say that in the description. I do really think it could
significantly clarify the descriptions. I could do some actual wording, if
you wish.

****************************************************************

From: Nick Roberts
Sent: Saturday, May 15, 2004  8:39 AM

> > [1] The vectors and maps are intended to automatically expand when required.
> > This is fine, but the interface seems to provide no control over this
> > expansion at all.
> > Would it perhaps be a good idea to add a generic parameter
> > such as below?
> >
> >    Expansion_Size: Size_Type := [implementation defined];
> >    Expansion_Factor: Float := [implementation defined];
>
> Use Resize to supply a hint about intended maximum length.  The
> implementation then resizes the container according to the algorithm the
> vendor has chosen.
> In the AI-302 reference implementation, the array is automatically
> expanded to twice its current size.

This seems to correspond with the idea of having something like:

   Expansion_Factor: Float := 2.0;

as a generic parameter.

Such a parameter would not interfere with the use of Resize, wherever the
user could or wanted to use it (and which would certainly be superior where
it could be used). However, it would provide a small extra measure of
control for the user.

An implementation could partially or entirely ignore the value of
Expansion_Factor, if there were better criteria for it to base the decision
on. Since it has a default value, it does not get in the way of the user who
doesn't want to use it.

I don't think its addition would add much complexity to the specifications,
or much burden to implementations. It would actually simplify some
implementations, wouldn't it?.

I seem to remember that, back in the days when computers (operating systems)
had fixed-length files on their hard disks, you could usually specify an
expansion size for a file. A file would be automatically reallocated,
expanded by it expansion size, when necessary (just like a vector in the AI,
curiously).

Okay, I think I've argued the case for this feature as strongly as possible
now :-)

> > [2] What was the reason for not permitting Resize to make a container
> > smaller, please?
>
> Make a copy of the container, Clear the original, and then Move the copy
> to the original.   (Wasn't this in the examples section?)

Yes, but that doesn't answer my question, Matt!

> > Finally, is there a sample implementation of any these packages yet?
>
> <http://charles.tigris.org/>
>
> See the ai302 subdirectory.
>
> The vector containers in the ai302 subdirectory conform to the most
> recent AI-302 draft (dated 2004/04/29).  Look for updates to the
> remaining containers this weekend.  (I recommend simply joining the
> charles project mailing lists, so you get notified automatically.)

Great. Thanks.

****************************************************************

From: Ehud Lamm
Sent: Sunday, May 16, 2004  4:59 AM

> An implementation could partially or entirely ignore the value of
> Expansion_Factor, if there were better criteria for it to base the decision
> on. Since it has a default value, it does not get in the way of the user who
> doesn't want to use it.

This makes sense to me. That's the way I usually do it.

****************************************************************

From: Nick Roberts
Sent: Saturday, May 15, 2004  8:48 AM

> > Matt shows that it can be done by jumping through hoops, so
> > there is a way to do it in the rare case that it is needed.
>
> Just to add to what Randy said: the point of Resize is to prevent
> automatic expansion that would otherwise occur as items are inserted
> into the container.  It's not influencing the size that's important per
> se; rather, it's disabling expansion.
>
...

Okay, but it would be way easier to be able to use one call to Resize
instead!

> Note that I've been an STL user for 4 years now, and I've never actually
> had a need to shrink a vector.  Most of the time I use a vector to store
> a large index or whatever, and usually I can determine prior to
> insertion how many items I'm going to insert, so I call Resize first.

Hmm. I think perhaps what you're missing is the case where: (a) you don't
know in advance what size is going to be required; (b) you want to Resize
the vector to something big, so as to minimise (eliminate) reallocations. I
think this is a fairly common scenario. In this kind of case, the user knows
the length of the vector after it has been populated, and would probably
like to be able to issue a simple Resize afterwards to change the size of
the vector to its length (eliminating wasted space). E.g.:

   Open(File,...);
   Resize(Vector,100_000);
   while not End_of_File(File) loop
      Read(File,X);
      Append(Vector,X);
   end loop;
   Close(File);
   Resize(Vector,Length(Vector));

Does this not make sense?

****************************************************************

From: Matthew Heaney
Sent: Saturday, May 15, 2004  11:40 AM

> Okay, but it would be way easier to be able to use one call
> to Resize instead!

Right now Resize has the same semantics as reserve() does in the STL.
You might want to post a note on comp.lang.c++ asking about reserve()
(and its associated function capacity()).  You might want also want to
send your question to Musser, Plauger, or Scott Meyers to get their
opinion.

> Hmm. I think perhaps what you're missing is the case where:
> (a) you don't know in advance what size is going to be
> required; (b) you want to Resize the vector to something big,
> so as to minimise (eliminate) reallocations. I think this is
> a fairly common scenario.

In that case I would use a std::deque, not a std::vector, if the number
of elements is large and I need population of the container to be as
fast as possible.

(I had included a deque container in my original proposal, but removed
it after the ARG asked me to reduce its size.  We should revisit this if
there's ever a secondary container library standard.)

>In this kind of case, the user
> knows the length of the vector after it has been populated,
> and would probably like to be able to issue a simple Resize
> afterwards to change the size of the vector to its length
> (eliminating wasted space). E.g.:
>
>    Open(File,...);
>    Resize(V,100_000);
>    while not End_of_File(File) loop
>       Read(File,X);
>       Append(V,X);
>    end loop;
>    Close(File);
>    Resize(V,Length(V));
>
> Does this not make sense?

Read the file into a temporary vector (which has been resized as above),
and then assign it to the real vector V.

The moral of the story is that you can shrink a vector.  We're only
disagreeing about the syntax.

(Note that what I do mostly involves .avi files, which have a header
describing how many frames are in the file.  So in my case I read in the
avi stream header first, and then resize the vector based on the
information in the header.)

****************************************************************

From: Nick Roberts
Sent: Monday, May 17, 2004  2:47 PM

"Matthew Heaney" <matthewjheaney@earthlink.net> wrote:

> > Okay, but it would be way easier to be able to use one call
> > to Resize instead!
>
> Right now Resize has the same semantics as reserve() does in the STL.
> You might want to post a note on comp.lang.c++ asking about reserve()
> (and its associated function capacity()).  You might want also want to
> send your question to Musser, Plauger, or Scott Meyers to get their
> opinion.

I must say, that seems like a very evasive answer. Can you not give a direct
answer to the question "Why not permit Resize to reduce the size of a
vector?" Why does the Ada standard need to do what the STL does?

> > Hmm. I think perhaps what you're missing is the case where:
> > (a) you don't know in advance what size is going to be
> > required; (b) you want to Resize the vector to something big,
> > so as to minimise (eliminate) reallocations. I think this is
> > a fairly common scenario.
>
> In that case I would use a std::deque, not a std::vector, if the number
> of elements is large and I need population of the container to be as
> fast as possible.
>
> (I had included a deque container in my original proposal, but removed
> it after the ARG asked me to reduce its size.  We should revisit this if
> there's ever a secondary container library standard.)

In which case, I must ask what is the point of providing the vector
abstraction at all? What does it provide that is not bettered, in practice,
either by Ada's instrinsic arrays or by the list abstraction?

> ...
> The moral of the story is that you can shrink a vector.  We're only
> disagreeing about the syntax.

Yes, we are disagreeing about the syntax. I am suggesting that the syntax:

   Resize(V,Length(V));

is a big improvement upon:

   declare
      Temp : Vector := V;
   begin
      Clear (V);
      Move (Target => V, Source => Temp);
   end Shrink;

and I do not see -- and I have not been given -- any reason why the former
should not be permitted.

> (Note that what I do mostly involves .avi files, which have a header
> describing how many frames are in the file.  So in my case I read in the
> avi stream header first, and then resize the vector based on the
> information in the header.)

In which case, why do you not simply use an array?

****************************************************************

From: Randy Brukardt
Sent: Monday, May 17, 2004  5:07 PM

> In which case, I must ask what is the point of providing the vector
> abstraction at all? What does it provide that is not bettered, in
practice,
> either by Ada's intrinsic arrays or by the list abstraction?

Because Matt is very tied (in his mind) to a particular implementation. The
containers as described in AI-302-03 are much more abstract, and do not have
a prescribed implementation. Janus/Ada will probably use a two-level
implementation for vector (which is more like what Matt calls a "Deque"),
because the extra cost of such an implementation is quite low in return for
the benefits that will be available. (It also maps much better to the
code-shared generics of Janus/Ada).

...
> > (Note that what I do mostly involves .avi files, which have a header
> > describing how many frames are in the file.  So in my case I read in the
> > avi stream header first, and then resize the vector based on the
> > information in the header.)
>
> In which case, why do you not simply use an array?

I've made this point many times, and you're never going to get a
satisfactory answer. It's best to let it go. (Otherwise, we'll use up the
entire budget for Ada 2005 discussing trivialities, and there will not be
any money to build the RM...)

Earlier, Nick wrote:

> This seems to correspond with the idea of having something like:
>   Expansion_Factor: Float := 2.0;
> as a generic parameter.

This is very specific to a particular implementation. We don't want that
much specification of the implementation.

...
> An implementation could partially or entirely ignore the value of
> Expansion_Factor, if there were better criteria for it to base the
decision
> on. Since it has a default value, it does not get in the way of the user
who
> doesn't want to use it.

We don't want a parameter whose value can be ignored. Resize itself is bad
enough.

In any event, micro-managing memory use is not what containers are about.
You use them when you want the system to manage memory for you. If you care
deeply about memory use, you need to build your own abstractions. If you
don't, a compiler update could completely destroy your system's performance.
(You can't rely on predefined stuff for critical time/space performance.)

...
> I suspect, with respect, that you are being a bit hopeful if you expect
> implementations to use blocking, caching, or other optimisations. I doubt
> that many will, in practice. And with an implementation close to the
model,
> there would be no difficulty in shrinking (by reallocation and copying, as
> for enlargement). Actually, I think shrinking would probably be feasible
for
> most implementations, maybe all.

IBM Rational insisted on weakening some of the requirements so that they
could use alternative implementations. Similarly, I've been very concerned
about specifying an implementation, simply because Matt's implementations
would be outrageously slow if compiled for Janus/Ada (due to generic code
sharing). I fully intend to use a two-level scheme for vectors. All of the
containers will use limited free lists to avoid excess allocation. I've
considered allocation blocking for lists (but it wouldn't work for
Janus/Ada, so we won't do that). Now, some vendors may simply use Matt's
implementations, but it's pretty clear that at least some vendors are not
planning to do so.

> Hmm. Well, I intended the 'slot' to be an abstract (model) concept, and
you
> could even say that in the description. I do really think it could
> significantly clarify the descriptions. I could do some actual wording, if
> you wish.

But we don't need it! Containers just hold a number of elements; all else is
specific to particular implementations, and does not really belong in the
standard. We made a number of specific exceptions to that to allow inserting
of empty elements into vectors for performance reasons (similar to the
reason that Resize exists). Those do not need any wrapper concept.

As I said, Matt's original text had "nodes" in many places, and I took them
out as much as possible. It generally shortened the wording; there were no
cases where it helped anything. (It's more useful in the List container, but
even there, it would be best to remove it. Just no more energy or budget.)

And no thanks, I don't have any energy or budget to spend training someone
how to write Standard wording. Especially when it isn't necessary. I don't
doubt that there exist paragraphs that need wordsmithing, but I think the
overall wording is about on the right level.

****************************************************************

From: Pascal Obry
Sent: Tuesday, May 18, 2004  12:55 AM

 > Because Matt is very tied (in his mind) to a particular implementation. The
 > containers as described in AI-302-03 are much more abstract, and do not have
 > a prescribed implementation.

This is not true for the map. The name is Indefinite_Hashed_Maps. This state
cleary that the implementation uses an hash table. I have found that if an
hash table is very fast for small set of data (< 10000) it is quite slower
than an AVL for very large set of data (> 100_000). Maybe this is the current
reference implementation but that's what I have experienced. FYI, the AVL
implementation I'm talking about is Table_Of_*_And_Dynamic_Data_G from the
LGL.

****************************************************************

From: Randy Brukardt
Sent: Friday, May 19, 2004  7:14 PM

The "Hashed_Maps" and "Ordered_Sets" cases are special. I think everyone would
have preferred to avoid specifying an implementation there as well. But that's
impossible, because of the vastly different generic parameters needed. That is,
a "Hashed_Map" takes a hash function as a generic parameter, while an
"Ordered_Set" (implemented as a tree) takes generic ordering operators as
generic parameters. So, that exposes the basic implementation, as does any
ordering requirements (hash tables aren't ordered by definition). Given these
basic properties differ, a container where the hash vs. tree implementation
isn't specified doesn't make sense.

I do have to wonder about your results. Since an AVL tree is going to be log N
access by key, it should be quite a bit slower in large collections. The only
reason for a hash table to slow down is a bad hash function (which then could
make long chains in a few buckets) - essentially turning lookups into brute
force searches. Are you sure that your hash function is good enough for "large
sets of data"? An ideal function would put one item into each bucket.

****************************************************************

From: Pascal Obry
Sent: Saturday, May 22, 2004  2:04 AM

The hash routine was not good at all. We have discussed this with Matthew,
using a standard one (close to a hash routine used to implement associative
arrays in Tcl or Gawk) the hash table is now 2 times faster.

****************************************************************

From: Michael F. Yoder
Sent: Saturday, May 22, 2004  11:40 AM

I've seen bad behavior with hashing many times, both in personal and
professional contexts.  The basic reason is: if you use a fixed table
size and linear chaining within a bucket, hashing is linear (albeit with
a small constant) and large datasets can perform very badly even if the
hash function is good.  I don't recall the problem ever being a bad hash
function, though it could have occurred and I've forgotten.

My own solution was to expand the table size when it becomes 3/4 full or
so (using internal rather than external chaining); it might be better to
make each bucket be a tree.  The latter solution has a security benefit:
it mitigates DOS attacks based on causing collisions deliberately.  This
consideration occurred at my last job, but admittedly isn't a common
one.  For what it's worth, the use of an expanding table has always
solved the problem.

****************************************************************

From: Tucker Taft
Sent: Saturday, May 22, 2004  3:42 PM

The Hash_Maps are intended to be expandable hash tables.
That's what Resize() is all about.  And yes, I expect the only
reason AVLs might start to outperform a hash table is if the
hash table has a fixed number of buckets.

****************************************************************

From: Randy Brukardt
Sent: Saturday, May 22, 2004  3:46 PM

The containers library uses an expanding hash table. The only way the
behavior can get bad is if the hash function isn't good enough to use most
of the buckets.

****************************************************************

From: Ehud Lamm
Sent: Sunday, May 16, 2004  4:54 AM

> Tucker wrote:
>
> > I guess I am still not convinced.  If you use a binary
> > tree, having a cursor pointing into the tree is not
> > always terribly useful when you are trying to search
> > for some subsequent element with a given key.  You will
> > often have to go back "up" several levels before being
> > able to go back down.  With "Slice" you are forcing
> > every operation to support a "virtual" subset as well
> > as a real set.  This is going to inevitably introduce
> > some distributed overhead. I would be surprised if on
> > balance, this is a net savings.  I'm sure you could
> > construct a case where it would be a savings, but overall,
> > I would expect the mix of uses would favor keeping the
> > abstraction simpler.
>
> I totally agree. Moreover, there is overhead from requiring every
> implementation of Sets to support by-reference, not copied
> set objects. (That is, the result of Slice).

This is also the way I see it.
Perhaps I missed something, so let me put it bluntly: are we talking ADT
interfaces here, or are we working solely for a specific implementation?
As you know from our Ada-Europe workshop a couple of years ago, I am firmly
in the ADT camp myself, so I prefer interfaces that don't impose to many
implementation restrictions. They an then be extended at will -- much easier
than removing operations that are hard of inefficient to support.

****************************************************************

From: Marius Amado Alves
Sent: Monday, May 17, 2004  1:01 PM

[Slice et al.]

> Perhaps I missed something, so let me put it bluntly: are we talking ADT
> interfaces here, or are we working solely for a specific implementation?

Both. Slice provides a way to express ranges declaratively (interface) and a
way to pass information to operations that can use it to optimize
(implementation, but not specific).

(Just clarifying. The cases have been made, the tendency of the ARG is to
leave Slice out, so it's only academic now.)

****************************************************************

From: Ehud Lamm
Sent: Sunday, May 16, 2004  5:03 AM

> From: Matthew Heaney [mailto:mheaney@on2.com]
>
> Tucker Taft wrote:
>
> > I don't think we need to change
> > "Previous" to make these equivalences work for
> > endpoints.   Just let the user write a
> > "Previous_Or_Last" if they really want to,
> > which would need to take both a cursor and a set.
> > Or more directly, write Lower_Limit or Upper_Limit
> > if you want them, since these already have enough
> > information with the set and the key.
> >
> > Providing Ceiling and Floor still seems adequate to me,
> > as they provide the needed primitives for all other
> > operations mentioned thus far.
>
> OK.  That seems reasonable.  I just wanted to make sure we
> were on the
> same page w.r.t the behavior at the endpoints.

It does seem reasonable, and since I never used this sort of operations, my
opinion shouldn't count as much, so take this with a grain of salt...

It looks like the equivalences help understand what's going on. The special
cases make code less readable and the logic a bit less clear. How important
this is, is hard to judge.

I wager many students will forget about the special case. Why not provide
Lower_Limit or Upper_Limit? The cost seems tiny.

****************************************************************

From: Matthew Heaney
Sent: Monday, May 17, 2004  1:15 PM

I am in favor of providing the following four operations:

Lower_Limit (S, K) < K    (AKA "Ground", "Previous_Floor")
Floor (S, K) <= K
Ceiling (S, K) >= K       (AKA Lower_Bound)
Upper_Limit (S, K) > K    (AKA Upper_Bound, "Roof", "Next_Ceiling")

I think Tucker only wants the middle two.

If I had to pick only two, then I'd pick the last two (Ceiling and
Upper_Limit).  (This is what the STL & Charles do, and what was in the
API prior to the ARG meeting in Phoenix.)

Note that there are really two separate issues:

(1) What is the value of the expression:

   Previous (Next (C))

We got rid of the internal sentinel node in Phoenix, which means once a
cursor assumes the value No_Element, then it keeps that value.

This is what Tucker and I were discussing in the earlier message quoted
above, about letting a user define a Previous_or_Last function if he
needs to back up onto the actual sequence.

(2) Restoring the functionality of the two operations formerly known as
"Lower_Bound" and "Upper_Bound".

There seems to be agreement that this functionality is useful.  One of
the issues is that several of the ARG reviewers were confused by the
names "Lower_Bound" and "Upper_Bound".

****************************************************************

From: Tucker Taft
Sent: Monday, May 17, 2004  1:49 PM

Will this never end? ;-)

My *major* complaint with Upper_Limit, Lower_Limit,
Upper_Bound, Lower_Bound, etc. is that the names
make no intuitive sense.

If you could come up with some reasonable names,
I might support the inclusion.  I do not find
any of the ones that have been proposed thus far
acceptable.

Predecessor and Successor might make it, where they
are allowed to take a key that might or might
not appear in the set, and return the cursor for
the item in the set next preceding or following the given key.

****************************************************************

From: Michael F. Yoder
Sent: Wednesday, May 19, 2004  11:48 AM

Whether 2 or 4 operations are included, it would be pleasant if the
names came from a consistent scheme.  For example:

    Lt_Item (S, K) < K
    Le_Item (S, K) <= K
    Gt_Item (S, K) > K
    Ge_Item (S, K) >= K

This is easier to do if the "Lt" and "Gt" operations are the only two
provided.  For example, 'Predecessor' and 'Successor' would be fine.
Floor for Le_Item and Ceiling for Ge_Item, together with Predecessor and
Successor, would be acceptable.

****************************************************************

From: Christoph Grein
Sent: Sunday, May 23, 2004  11:37 PM

I do think the names at the right intuitively describe the meaning:

     Gt_Item (S, K) >  K         Roof
     Ge_Item (S, K) >= K         Ceiling
     Le_Item (S, K) <= K         Floor
     Lt_Item (S, K) <  K         Ground, Basement

It's like a building, you're in a room, which has a floor and a ceiling; above
is the roof (or the attic), below the basement or ground.

****************************************************************

From: Marius Amado Alves
Sent: Monday, May 24, 2004  5:26 AM

":=" for containers clones the source (as opposed to passing a reference
to).

Do I understand correctly that this behaviour is specified solely by the
fact that containers are non-limited?

In that case, wouldn't a small clarifying Note by useful, specially for new
users coming e.g. from... uh... Java...

And should't the behaviour of ":=" be documented for any controlled type
anyway?

****************************************************************

From: Matthew Heaney
Sent: Wednesday, June 9, 2004  9:40 AM

I have a few comments on the Phoenix release of AI-302 (2004-04-29
AI95-00302-03/03).   Each comment is bracketed with "MJH:" and "ENDMJH."
pairs, and immediately follows the item to which it refers.

-Matt


A.17.2 The Package Containers.Vectors

generic
...
package Ada.Containers.Vectors is
...
    function To_Vector (Count : Size_Type) return Vector;

MJH:
I wasn't absolutely sure whether the formal param should be named
"Count" or "Length".   The term "count" is used elsewhere in this spec,
but here it actually specifies the length of the vector object returned
by the function.
ENDMJH.


    function To_Vector
      (New_Item : Element_Type;
       Count    : Size_Type) return Vector;

MJH:
This is formatted inconsistently.  It should be:

    function To_Vector (New_Item : Element_Type;
                        Count    : Size_Type)
      return Vector;
ENDMJH.




    procedure Set_Length (Container : in out Vector;
                          Length    : in     Size_Type);

MJH:
Should we include following operation too?

    procedure Set_Length (Container : in out Container_Type;
                          Length    : in     Size_Type;
                          New_Item  : in     Element_Type);

This would allow the user to specify an actual value for the new
elements, if the length of the vector is increased.
ENDMJH.


    procedure Swap (Container : in out Vector;
                    I, J      : in     Cursor);

MJH:
Should be weaken the precondition, allowing the case in which both I and
J have the value No_Element?  In that case Swap would be a no-op.
(Right now I think it's an error.)
ENDMJH.


function To_Index (Position  : Cursor) return Index_Type'Base;

If Position is No_Element, Constraint_Error is propagated. Otherwise, the
index (within its containing vector) of the element designated by Cursor is
returned.

MJH:
Should this be reworded to say "If Has_Element (Position) is False..."?
ENDMJH.

MJH:
Also, note that if Position may only designate an active element in the
container, then we don't need to return Index_Type'Base.  We can
strengthen the post-condition by returning Index_Type.
ENDMJH.

AARM Note: This implies that the index is determinable from a bare cursor
alone. The basic model is that a vector cursor is implemented as a record
containing an access to the vector container and a index value. This does
constrain implementations, but it also allows all of the cursor operations
to be defined in terms of the corresponding index operation (which should be
primary for a vector).

MJH:
It's not clear if CE is supposed to be propagated if Position does not
specify a value within the range of currently active elements of
Container.  For example:

declare
    V : Vector;
    C : Cursor;
    I : Index_Type'Base;
begin
    Append (V, E);
    C := First (V);
    Delete_First (V);
    I := To_Index (C); --valid?
end;
ENDMJH.


generic
    with procedure Process (Element : in out Element_Type) is <>;
procedure Generic_Update_by_Index (Container : in Vector;
                                    Index     : in Index_Type'Base);

If Index is not in the range First_Index (Container) .. Last_Index
(Container),
then Constraint_Error is propagated. Otherwise, it calls the generic actual
bound to Process with the element at position Index as the parameter. Any
exceptions raised by Process are propagated.

If Element_Type is unconstrained and definite, then the Element parameter
shall be unconstrained.

AARM Note: This means that the elements cannot be aliased nor directly
allocated from the heap; it must be possible to change the discriminants
of the element in place.

The element at position Index is not an empty element after successful
completion of this operation.

AARM Note: Since reading an empty element is a bounded error, attempting to
use this procedure to replace empty elements may fail. Use Replace_Element
to do that reliably.

MJH:
What did we conclude about this?  I thought using Generic_Update to
initialize a space element was ok?  (Or was that only for a list?)

Is this AARM Note in conflict with the note below?
ENDMJH.



procedure Replace_Element (Position : in Cursor;
                            By       : in Element_Type);

This function assigns the value By to the element designated by Position.
If Position equals No_Element, then Constraint_Error is propagated.
Any exceptions raised during the assignment are propagated. The element
designated by Position is not an empty element after successful
completion of
this operation.

AARM Note: Replace_Element, Generic_Update, and Generic_Update_by_Index are
only ways that an element can change from empty to non-empty.

MJH:
Is this AARM Note in conflict with the note above?
ENDMJH.



procedure Insert (Container : in out Vector;
                   Before    : in     Cursor;
                   New_Item  : in     Vector);

If Before is No_Element, then is equivalent to Insert (Container,
Index_Type'Succ (Last_Index (Container)), New_Item); otherwise is
equivalent to Insert (Container, To_Index (Before), New_Item);

MJH:
Should this be reworded to say "Has_Element (Before) = False..." instead?
ENDMJH.

MJH:
We probably need to say here that if New_Item is empty, then then
operation has no effect.  Otherwise there's a constraint check if
Before=No_Element (IT'Succ (Cont.Last)) can fail, when Cont.Last=IT'Last).
ENDMJH.

MJH:
Here and elsewhere the equivalence is in terms of To_Index, but this
might be too restrictive.  Before is allowed to be IT'Succ (Cont.Last),
but I think To_Index raises an exception if it has that value.
ENDMJH.


procedure Insert (Container : in out Vector;
                   Before    : in     Cursor;
                   New_Item  : in     Vector;
                   Position  :    out Cursor);

Create a temporary (call it Temp_Index) and set it to Index_Type'Succ
(Last_Index (Container)) if Before equals No_Element, and To_Index (Before)
otherwise. Then Insert (Container, Before, New_Item) is called, and finally
Position is set to To_Cursor (Container, Temp_Index).

AARM Note: The messy wording because Before is invalidated by Insert, and we
don't want Position to be invalid after this call. An implementation
probably
only needs to copy Before to Position.

MJH:
See note above.
ENDMJH.



procedure Insert (Container : in out Vector;
                   Before    : in     Cursor;
                   New_Item  : in     Element_Type;
                   Count     : in     Size_Type := 1);

Equivalent to Insert (Container, Before, To_Vector (New_Item, Count));

MJH:
See note above when Count = 0.  (We should state explicitly that if
Count=0, then the operation is a no-op, and there are no constraint
checks or any other exceptions.  The value or state of cursor Before is
not checked or otherwise considered, when Count=0.)
ENDMJH.


procedure Insert (Container : in out Vector;
                   Before    : in     Cursor;
                   New_Item  : in     Element_Type;
                   Position  :    out Cursor;
                   Count     : in     Size_Type := 1);

Equivalent to Insert (Container, Before, To_Vector (New_Item, Count),
Position);

MJH:
See not above re count=0.
ENDMJH.


procedure Prepend (Container : in out Vector;
                    New_Item  : in     Vector;
                    Count     : in     Size_Type := 1);

Equivalent to Insert (Container, Index_Type'First, New_Item).

MJH:
Typo: this declaration should look like this:

procedure Prepend (Container : in out Vector;
                    New_Item  : in     Vector);
ENDMJH.




procedure Insert_Space (Container : in out Vector;
                         Before    : in     Cursor;
                         Position  :    out Cursor;
                         Count     : in     Size_Type := 1);

Create a temporary (call it Temp_Index) and set it to
Index_Type'Succ (Last_Index (Container)) if Before equals No_Element, and
To_Index (Before) otherwise. Then Insert_Space (Container, Temp_Index,
Count) is called, and finally Position is set to To_Cursor (Container,
Temp_Index).

MJH:
See note above re count=0.
ENDMJH.



procedure Delete (Container : in out Vector;
                   Position  : in out Cursor;
                   Count     : in     Size_Type := 1);

If Count is 0, the operation has no effect. Otherwise is equivalent to
Delete (Container, To_Index (Position), Count).

MJH:
Here and elsewhere when Count is 0, I think we need to specify what
value for Position is returned.
ENDMJH.

MJH:
If Count is non-zero, then how should we handle a Position that does
not designate an active element.  Above, we raise CE.  Is this correct?
ENDMJH.

MJH:
We probably need to say here that Position is set to Position.Index if
Index continues to designate an element in the container, or No_Element
if Position was part of the entire tail that was deleted.
ENDMJH.



procedure Delete_Last (Container : in out Vector;
                        Count     : in     Size_Type := 1);

If Length (Container) < Count then is equivalent to
Delete (Container, Index_Type'First, Count); otherwise
is equivalent to Delete (Container,
Index_Type'Val(Index_Type'Pos(Last_Index(Container)) - Count + 1), Count).

MJH:
If Length (C) >= Count, then isn't it easier to simply say that it's
the same as Clear (C)?
ENDMJH.


Returns the value Index_Type'First.

MJH:
What operation does this description refer to?  I assume it's First_Index.
ENDMJH.


procedure Swap (Container : in Vector;
                 I, J      : in Cursor);

Equivalent to Swap (Container, To_Index (I), To_Index (J)).

MJH:
I mentioned this above.  We might want to weaken the precondition of
Swap, to allow cursors both of which Has_Element returns False to be
swapped; that is, if both are No_Element, then Swap should be a no-op.
ENDMJH.




function Find (Container : Vector;
                Item      : Element_Type;
                Index     : Index_Type'Base := Index_Type'First)
    return Index_Type'Base;

Searches the elements of Container for an element equal to Item,
starting at position Index. If Index is less than Index_Type'First,
then Constraint_Error is propagated. If there are no elements in the
range Index .. Last_Index (Container) equal to Item, then Find returns
Index_Type'Succ (Last_Index (Container)). Otherwise, it returns the index of
he matching element.

MJH:
Here and in the other find ops we should probably weaken the precondition,
such that if the container is empty, we return failure status
immediately, without vetting or otherwise interrogating the value of Index.
ENDMJH.

function Find (Container : Vector;
                Item      : Element_Type;
                Position  : Cursor := No_Element)
    return Cursor;

Searches the elements of Container for an element equal to Item,
starting at the first element if Cursor equals No_Element, and at
the element designated by Cursor otherwise, and searching to the last
element in Container. If an item equal to Item is found, Find returns a
cursor designating the first element found equal to Item. If no such item is
found, it returns No_Element.

MJH:
Suppose Has_ELement (Position) = False, is this an error (raise CE), or
does it count as No_ELement (start from IT'First)?
NDMJH.



A.17.3 The Package Containers.Doubly_Linked_Lists


procedure Delete (Container : in out List;
                   Position  : in out Cursor;
                   Count     : in     Size_Type := 1);

If Position equals No_Element, the operation has no effect. Otherwise
Delete removes Count nodes starting at the node designated by Position
from Container (or all of the nodes if there are less than Count nodes
starting at Position). Any exceptions raised during deallocation of internal
storage are propagated.

MJH:
Is this inconsistent with vector?  I think we made it an error if
Size > 0 and Position = No_Element.   (I don't know which way we should
go, I just wanted to bring it up.)
ENDMJH.



procedure Swap (Container : in out List;
                 I, J      : in     Cursor);

Swap exchanges the nodes designated by I and J.

MJH:
Allow I and J to both assume the value No_Element?
ENDMJH.

MJH:
Does this swap nodes (by exchanging pointers, or does it
eave the nodes in their relative positions, and merely
exchange the values of the elements on those nodes?
ENDMJH.



A.17.5 The Package Containers.Ordered_Sets

generic
...
package Ada.Containers.Ordered_Sets is
...
    procedure Insert (Container : in out Set;
                      New_Item  : in     Element_Type;
                      Position  :    out Cursor;
                      Success   :    out Boolean);

    --MJH:
    --A nice function might be:
    --procedure Insert (Container : in out Set;
    --                  New_Item  : in     Element_Type);
    --This is a convenience function that omits the last two params.
    --ENDMJH.


    function Is_Subset (Item      : Set;
                        Container : Set)
       return Boolean;

MJH:
Clarify the results when one or both of the params are empty sets.  (I
assume that in set theory, the subset operation is defined on a pair
ull sets, but I don't remember offhand what the value is.)
ENDMJH.


    function Is_Disjoint (Item      : Set;
                          Container : Set)
       return Boolean;

MJH:
As above, clarify the results when one or both of the params are empty sets.
ENDMJH.

****************************************************************

From: Randy Brukardt
Sent: Wednesday, June 9, 2004  11:59 PM

A couple of comments on Matt's comments (I'm not going to comment on typos
and the like, it's too late to fix them before the meeting, and they're
recorded).

> function To_Index (Position  : Cursor) return Index_Type'Base;
>
> If Position is No_Element, Constraint_Error is propagated. Otherwise, the
> index (within its containing vector) of the element designated by
> Cursor is
> returned.
>
> MJH:
> Should this be reworded to say "If Has_Element (Position) is False..."?
> ENDMJH.

I don't think so. It's usually a bounded error to use a cursor that doesn't
point at an active element. That allows either raising Constraint_Error or
doing something else. You explain why below...

...
> MJH:
> It's not clear if CE is supposed to be propagated if Position does not
> specify a value within the range of currently active elements of
> Container.  For example:
>
> declare
>     V : Vector;
>     C : Cursor;
>     I : Index_Type'Base;
> begin
>     Append (V, E);
>     C := First (V);
>     Delete_First (V);
>     I := To_Index (C); --valid?
> end;
> ENDMJH.

It's very clear that this is a bounded error, and we're *not* requiring
implementations to detect this case (in this specific example, because
Delete is called on an element to the left). But we *allow* it to be
detected. I thought we had agreed that we didn't want the overhead of
detecting these kinds of errors.

The organization of the standard requires us to put the bounded error text
far away from this subprogram (which is unfortunate), but since it is a
general rule, that isn't too bad.

The bounded error rules apply to *all* uses of cursors except Has_Element,
so the answer is the same for all other routines.

> generic
>     with procedure Process (Element : in out Element_Type) is <>;
> procedure Generic_Update_by_Index (Container : in Vector;
>                                     Index     : in Index_Type'Base);
...
> MJH:
> What did we conclude about this?  I thought using Generic_Update to
> initialize a space element was ok?  (Or was that only for a list?)

It's also in the bounded error section. I think we concluded that we
couldn't allow Generic_Update, because it implies a read of the element. I
tried to find a way to avoid that, but if we did, then it wouldn't be
"Update" any more.

...
> AARM Note: Replace_Element, Generic_Update, and
> Generic_Update_by_Index are
> only ways that an element can change from empty to non-empty.
>
> MJH:
> Is this AARM Note in conflict with the note above?
> ENDMJH.

Someone asked that in April. Sheesh. Generic_Update is in the list because
it's a bounded error to call it, and *if* it doesn't raise an exception,
*then* it changes the element to non-empty. But you can't depend that it
doesn't raise an exception.

...
> procedure Delete (Container : in out List;
>                    Position  : in out Cursor;
>                    Count     : in     Size_Type := 1);
>
> If Position equals No_Element, the operation has no effect. Otherwise
> Delete removes Count nodes starting at the node designated by Position
> from Container (or all of the nodes if there are less than Count nodes
> starting at Position). Any exceptions raised during deallocation
> of internal storage are propagated.
>
> MJH:
> Is this inconsistent with vector?  I think we made it an error if
> Size > 0 and Position = No_Element.   (I don't know which way we should
> go, I just wanted to bring it up.)
> ENDMJH.

Yes, it seems to be inconsistent with Vector. Vector raises C_E for indexes
out of range (of course), and the cursor version mimics that behavior,
because it really can't do anything else. So I'd say this probably out to
raise C_E as well.

****************************************************************

From: Matthew Heaney
Sent: Thursday, June 10, 2004  10:32 AM

> It's very clear that this is a bounded error, and we're *not* requiring
> implementations to detect this case (in this specific example, because
> Delete is called on an element to the left). But we *allow* it to be
> detected. I thought we had agreed that we didn't want the overhead of
> detecting these kinds of errors.

OK, I just wanted to make sure.


The other thing I forget to mention is that the following operations are 
in the list package but not the vector package:


    procedure Delete (Container : in out List;
                      Item      : in     Element_Type);


    generic
       with function Predicate (Element : Element_Type)
          return Boolean is <>;
    procedure Generic_Delete (Container : in out List);


    procedure Reverse_List (Container : in out List);


    generic
       with function Predicate (Element : Element_Type)
          return Boolean is <>;
    function Generic_Find (Container : List;
                           Position  : Cursor := No_Element)
       return Cursor;


    generic
       with function Predicate (Element : Element_Type)
          return Boolean is <>;
    function Generic_Reverse_Find (Container : List;
                                   Position  : Cursor := No_Element)
       return Cursor;


There's no technical reason they should be in the list but not the 
vector.  Either we can add them to vector, or get rid of them for list.


Here's another idea.  We already have a Generic_Update, but another 
useful operation might be some kind of query operation, that either 
returns Boolean or a type you pass in as a generic formal.  Something like:

generic
    type Result_Type (<>) is limited private;
    function Process (E : ET) return Result_Type is <>;
function Generic_Query (Position : Cursor) return Result_Type;

Of course, a user could implement this as (here, for a Boolean Result_Type):

function Query (P : C) return Boolean is
    Result : Boolean;

    procedure Process (E : in out ET) is
    begin
       Result := Predicate (E); -- some algorithm
    end;

    procedure Update is new Generic_Update;
begin
    Update (P);
    return Result;
end;

The awkward case is when the Result_Type actual type is indefinite.  For 
example, were it type String you would have to use an unbounded_string 
or whatever as the temporary (but maybe that's not such a big deal).

Clearly you can implement a query-style function from the update 
modifier operation, but I wasn't sure whether that's possible in all 
cases for all possible return types, and if so whether this warrants the 
introduction of a dedicated operation.

****************************************************************

From: Randy Brukardt
Sent: Thursday, June 10, 2004  6:50 PM

...
> There's no technical reason they should be in the list but not the
> vector.  Either we can add them to vector, or get rid of them for list.

I'd be wary of adding too many rarely used routines to these containers.
Those just make the containers harder to learn and harder to implement with
little additional benefit.

Unbounded_Strings has a large number of rarely used routines, and yet it
never seems to have the odd routine I actually need. So, that actually
increases the frustration level, because you'd think that in so many
routines, every plausible need would be met. When there are fewer routines,
the expectation level is lower, too, and you wouldn't feel quite so
ripped-off.

In the routines you mentioned, I think that the generic routines are too
specialized - it would be rare that you both could match their usage pattern
*and* would remember that they exist. Delete by item seems error-prone if
there are multiple identical items in the container (does it delete just one
or all of them? Explain your choice, and why the user would expect that over
the other possibility.) Reverse_List (which probably should just be called
"Reverse") doesn't seem that useful, and is masking a lot of work. So I'd
probably dump the whole lot. But I do agree that List and Vector should be
the same, whatever is decided.

> Here's another idea.  We already have a Generic_Update, but another
> useful operation might be some kind of query operation, that either
> returns Boolean or a type you pass in as a generic formal.
> Something like:
>
> generic
>     type Result_Type (<>) is limited private;
>     function Process (E : ET) return Result_Type is <>;
> function Generic_Query (Position : Cursor) return Result_Type;

This seems too specialized to me. Most of the time, it would make just as
much sense to write a function of the Element. Besides, this seems like it
would be illegal if AI-318 is passed as currently planned, since limited
unconstrained types will not be allowed to be returned. So there is a
contract issue here (having a function that has to be able to both
build-in-place and return-by-copy seems like a very nasty case for generic
sharing implementations).

In any case, we need to avoid "feeping creaturism" here. KISS definitely
applies!

****************************************************************

From: Pascal Obry
Sent: Wednesday, June 9, 2004  10:43 AM

One feedback after migrating AWS to the AI302 reference implementation. The
procedure Size and Length are really too confusing. I have at least 2 times
used the wrong one (using Size instead of Length). Length is ok, maybe Size
should be renamed Hash_Size or something like that.

For the record:

   function Size (Container : Vector) return Size_Type;
   -> returns the size of the hash table (number of buckets)

   function Length (Container : Vector) return Size_Type;
   -> returns the number of item in the vector

Also, as Size and Resize are low-level stuff I would put those routines at the
end of the package. Another solution would be to put such routines into a
child package. Thoughts ?

****************************************************************

From: Matthew Heaney
Sent: Wednesday, June 9, 2004  2:55 PM

Pascal Obry wrote:

> One feedback after migrating AWS to the AI302 reference implementation. The
> procedure Size and Length are really too confusing. I have at least 2 times
> used the wrong one (using Size instead of Length). Length is ok, maybe Size
> should be renamed Hash_Size or something like that.

It's not unlike for an array, which has both 'Length and 'Size attributes.


> For the record:
> 
>    function Size (Container : Vector) return Size_Type;
>    -> returns the size of the hash table (number of buckets)

No.  The Size of a hashed map container specifies the maximum length 
(number of items) before which automatic expansion of the internal hash 
table occurs.  It does *not* specify the number of buckets in the hash 
table.

(It is indeed the case that in the AI-302 reference implementation, 
function Size happens to return the number of hash table buckets, but 
that is a characteristic of that particular implementation.  It is not 
guaranteed to be the case for all implementations.)


>    function Length (Container : Vector) return Size_Type;
>    -> returns the number of items in the vector

Technically it's the "number of active elements," but let's not quibble.


> Also, as Size and Resize are low-level stuff I would put those routines at the
> end of the package. Another solution would be to put such routines into a
> child package. Thoughts ?

It's a bad idea.

****************************************************************

From: Pascal Obry
Sent: Wednesday, June 9, 2004  3:26 PM

What is a bad idea ? I have proposed 3 things :

- rename Size and keep Length

- move the Size and Resize to the end of the API

- move the Size and Resize routines into a child package

I hope that you at least see that Size/Length having the same prototype
is dangerous. It is even more dangerous that using Size instead of Length
can stay undetected for some time...

****************************************************************

From: Matthew Heaney
Sent: Wednesday, June 9, 2004  3:44 PM

I was referring to the suggestion in your last paragraph to make Size 
and Resize child subprograms.

****************************************************************

From: Pascal Obry
Sent: Wednesday, June 9, 2004  3:53 PM

Ok, I also think it is bad idea, was there for completeness :)

****************************************************************

From: Tucker Taft
Sent: Wednesday, June 9, 2004  3:34 PM

How about "Maximum_Length" and "Set_Maximum_Length" in place
of Size and Resize?

****************************************************************

From: Pascal Obry
Sent: Wednesday, June 9, 2004  3:42 PM

Fine with me.

****************************************************************

From: Robert A Duff
Sent: Wednesday, June 9, 2004  7:23 PM

> What is a bad idea ? I have proposed 3 things :

I don't know Matt's opinion, but here's mine:

> - rename Size and keep Length

Good idea.  I think this is fairly important.

> - move the Size and Resize to the end of the API

Good idea.  Not important.

> - move the Size and Resize routines into a child package

Bad idea.

> I hope that you at least see that Size/Length having the same prototype
> is dangerous. It is even more dangerous that using Size instead of Length
> can stay undetected for some time...

Yes, I agree.  The name Size should be changed to something else,
something nobody would mistake for Length.

****************************************************************

From: Nick Roberts
Sent: Wednesday, June 9, 2004  9:06 PM

> How about "Maximum_Length" and "Set_Maximum_Length" in place
> of Size and Resize?

I endorse this suggestion. Specifically, I suggest:

(1) In package Ada.Containers, change:

   type Size_Type is range 0 .. <implementation-defined>;

to:

   type Count_Type is range 0 .. <implementation-defined>;

and all subsequent uses of Size_Type be renamed to Count_Type.

(2) In packages Ada.Containers.Vectors, Ada.Containers.Hashed_Maps, (and
Ada.Containers.Indefinite_Hashed_Maps,) change:

   function Size (Container : Vector|Map) return Size_Type;

to:

   function Maximum_Length (Container : Vector|Map) return Count_Type;

and change:

   procedure Resize (Container : in out Vector|Map;
                     Size      : in     Size_Type);

to:

   procedure Set_Maximum_Length (Container : in out Vector|Map;
                                 To        : in     Count_Type);

(3) Change all references to the term 'size' to 'maximum length'. For
example, change the second paragraph of the proposed A.17.2 from:

   A vector container object manages an unconstrained internal array, which
   expands as necessary as items are inserted. The *size* of a vector
   corresponds to the total length of the internal array, and the *length*
   of a vector corresponds to the number of active elements in the internal
   array.

to:

   A vector container object conceptually manages an unconstrained internal
   array, which expands as necessary as items are inserted. The *maximum
   length* of a vector corresponds to the total length of this conceptual
   internal array, and the *length* of a vector corresponds to the number
   of active elements within this array.

An alternative to 'maximum length' and [Set_]Maximum_Length throughout all
the above could be 'allocated length' and [Set_]Allocated_Length.

This issue has been argued about before. Some said that the term 'size'
clashed with the predominant existing usage of the term in connection with
the number of storage units used up by objects and program units. Others
said that many terms are 'overloaded' in the RM, and the term 'size' is
already used to mean other things in some places.

However, I quite strongly feel that an alternative term could easily be
chosen, and it would be very desirable to do so, to avoid just the kind of
confusion Pascal reported.

I must also add that I still think it is unjustified that the size/maximum
length of a vector or map is not permitted to be reduced by any
implementation. Specifically, I advocate that Resize/Set_Maximum_Length be
allowed (by the standard) to reduce the size/maximum length of a vector or
map, but that implementations are permitted to ignore such reductions if
they wish. In fact, I would suggest that the current wording (forbidding
such reductions) is silly in a way, because I doubt very much that there
will
ever be an ACATS test for it. On that basis, I also question the wording
"Resize sets the size of Container to a value which is at least the value
Size", which could more sensibly be changed to "Resize sets the size of
Container to approximately the value Size".

(4) I suggest the paragraph:

 If Size (Container) is equal to or greater than Size, the operation does
 nothing. Otherwise Resize sets the size of Container to a value which is
 at least the value Size, expanding the internal array to hold Size
 elements. Expansion will require allocation, and possibly copying and
 deallocation of elements. Any exceptions raised by these operations
 are propagated, leaving the container with at least the original Size,
 Length, and elements.

be changed to:

 Set_Maximum_Length sets the maximum length of Container to approximately
 the value To, expanding or contracting the internal array as required.
 Expansion or contraction may require allocation, and possibly copying and
 deallocation of elements. Any exceptions raised by these operations are
 propagated, leaving the length and active elements of the container
 unchanged.

and that the following AARM notes be changed appropriately, and that this
implementation permission is added:

 Implementations are not required to support the [changing|reduction] of the
 maximum size of a container by Set_Maximum_Length, in which case calls
 of this procedure should do nothing.

I favour the word 'changing', on the basis that Set_Maximum_Length is
probably never going to be ACATS tested for its effect on the size (maximum
length) of a vector or map.

(4) I also suggest that the concept of an 'expansion factor' is added to
vectors and maps. Each vector or map has its own expansion
factor associated with it, which is a value of the subtype
Ada.Containers.Expansion_Factor_Type, declared as follows:

   subtype Expansion_Factor_Type is Float range 1.0 .. [impl def];

Whenever a vector or map is expanded automatically, the value of its
expansion factor at the time may be used (but does not have to be) by the
implementation to determine the new maximum length of the container,
nominally by multiplying the current maximum length by the current expansion
factor.

The initial (default) value of the expansion factor of a container is
implementation defined, but its value may be retrieved and set by the
following subprograms:

   function Expansion_Factor (Container : Vector|Map)
         return Expansion_Factor_Type;

   procedure Set_Expansion_Factor
         (Container : in out Vector|Map;
          To        : in Expansion_Factor_Type);

****************************************************************

From: Robert A. Duff
Sent: Thursday, June 10, 2004  7:45 AM

Tuck says:

> How about "Maximum_Length" and "Set_Maximum_Length" in place
> of Size and Resize?

I don't really like "Maximum_Length", because there actually *is* no max
length -- the whole point is these things can grow arbitrarily large.
I believe STL calls them "capacity" and "reserve".

Pretty much anything would be better than "Size", for the reasons Pascal
stated.

****************************************************************

From: Matthew Heaney
Sent: Thursday, June 10, 2004  10:10 AM

Well, it does describe when expansion happens.  How about:

function Expansion_Length
   (Container : in Map) return Size_Type;

procedure Set_Expansion_Length
   (Container : in out Map;
    Length    : in     Size_Type);

****************************************************************

From: Alexander E. Kopilovich
Sent: Thursday, June 10, 2004  11:31 AM

Another proposition:

  function Extent  -- or Current_Extent

and

  procedure Set_Extent  -- correspondily, Set_Current_Extent


But perhaps the best would be to say straight:

function Reserved_Length    -- or Reserved_Size

and

procedure Set_Reserved_Length  -- correspondily, Set_Reserved_Size

****************************************************************

From: Tucker Taft
Sent: Thursday, June 10, 2004  1:42 PM

> But perhaps the best would be to say straight:
> 
> function Reserved_Length   
> 
> and
> 
> procedure Set_Reserved_Length 

I like these.  "Capacity" is pretty much a synonym for
"Maximum_Length".  Both need the word "Current" added to
make it clear these are expandable.
"Reserved" has just the right connotation.

By the way, I agree that there seems no reason not to
allow Set_Reserved_Length to specify a smaller length,
though we then want Reserved_Length to be allowed to
return a value larger than the value most recently set
by Set_Reserved_Length.  Which might argue for changing
the "set" procedure's name to "Set_Minimum_Reserved_Length"
and the function's name to "Actual_Reserved_Length" to
be crystal clear.

****************************************************************

From: Alexander E. Kopilovich
Sent: Thursday, June 10, 2004  5:13 PM

Perhaps for this purpose "Provide_Reserved_Length" would be even better
(again, more straigt) than "Set_Minimum_Reserved_Length".

And additionally, as "provide" (unlike "set") is somehow uncertain about
the upper limit, there will be less need for the prefix "Actual_" before
"Reserved_Length".

****************************************************************

From: Nick Roberts
Sent: Friday, June 11, 2004  9:13 PM

I like these suggestions. I quite like 'Actual_Reserved_Length', but I think
it's not really necessary, since there is no other function
('Requested_Reserved_Length' or some such) for it to be contrasted with.

Perhaps a consensus is coming close to:

- rename the term 'size' as 'reserved length';

- rename the 'Size' functions as 'Reserved_Length';

- rename the 'Resize' procedures as 'Request_Reserved_Length'.

I would also like to suggest:

- rename the type 'Size_Type' as 'Count' or 'Count_Type'.

My justification for this is that the term 'size' is mainly used in
connection with storage units, so some potential for confusion would be
easily avoided by a different name, and a type 'Count' fulfilling a very
similar role is declared in the Ada.*_IO packages.

I would like to reiterate my suggestions that:

- the Request_Reserved_Length (Resize) procedures are permitted to reduce
the reserved length (size) of a container, but that in any case (reduction
or expansion) any actual change to the reserved length (size) remains
implementation defined;

- an explicit expansion factor is supported, as in my previous post.

My justification for the first is that it would not be sensible to formally
test whether an implementation obeyed a more stringent definition. The
Reserved_Length (Size) functions should return the actual reserved length
(size), but again, it would probably not be sensible to try to formally test
this (and it may not be possible).

My justification for the second is that there will often be situations where
the user (of the proposed container packages) knows better than the
implementation what the expansion factor should be, and in such cases the
implementation default for deciding by how much to expand a container
(whether by a simple factor or some other method) is likely to be very
inappropriate. [Sorry about numbering this point '(4)' in my previous post;
it should have been '(5)'.]

****************************************************************

From: Michael F. Yoder
Sent: Saturday, June 12, 2004  9:18 AM

>Perhaps a consensus is coming close to:
>
>- rename the term 'size' as 'reserved length';
>
>- rename the 'Size' functions as 'Reserved_Length';
>
>- rename the 'Resize' procedures as 'Request_Reserved_Length'.

If there is such a consensus, I'll add my support to it.  These seem 
like good ideas.

>I would also like to suggest:
>
>- rename the type 'Size_Type' as 'Count' or 'Count_Type'.
>
>My justification for this is that the term 'size' is mainly used in
>connection with storage units, so some potential for confusion would be
>easily avoided by a different name, and a type 'Count' fulfilling a very
>similar role is declared in the Ada.*_IO packages.

I agree.

>I would like to reiterate my suggestions that:
>
>- the Request_Reserved_Length (Resize) procedures are permitted to reduce
>the reserved length (size) of a container, but that in any case (reduction
>or expansion) any actual change to the reserved length (size) remains
>implementation defined;

I strongly agree.  Requiring that the user write the size reduction code 
via a copy forecloses even the possibility of a reduction that avoids 
copying.

>- an explicit expansion factor is supported, as in my previous post.
>
>My justification for the first is that it would not be sensible to formally
>test whether an implementation obeyed a more stringent definition. The
>Reserved_Length (Size) functions should return the actual reserved length
>(size), but again, it would probably not be sensible to try to formally test
>this (and it may not be possible).
>
>My justification for the second is that there will often be situations where
>the user (of the proposed container packages) knows better than the
>implementation what the expansion factor should be, and in such cases the
>implementation default for deciding by how much to expand a container
>(whether by a simple factor or some other method) is likely to be very
>inappropriate. [Sorry about numbering this point '(4)' in my previous post;
>it should have been '(5)'.

I'm less enthusiastic about the expansion factor, but I don't oppose it.

****************************************************************

From: Robert A. Duff
Sent: Monday, June 14, 2004  8:55 AM

Mike Yoder wrote:

> Nick Roberts wrote:
> 
> >- the Request_Reserved_Length (Resize) procedures are permitted to reduce
> >the reserved length (size) of a container, but that in any case (reduction
> >or expansion) any actual change to the reserved length (size) remains
> >implementation defined;
> >
> I strongly agree.  Requiring that the user write the size reduction code 
> via a copy forecloses even the possibility of a reduction that avoids 
> copying.

The STL guarantees that the reserved size is at least that requested.
This is important because it means that cursors/iterators that
point into the data structure do not become invalid while
appending (up to that reserved size).

****************************************************************

From: Nick Roberts
Sent: Sunday, June 20, 2004  1:53 PM

However, the semantics required by the current AI-302 is clearly different.

The relevant wording is:

   A Cursor value is *ambiguous* if any of the following have occurred
   since it was created:
     * Insert or Delete has been called on the vector that contains the
       element the cursor designates with an index value (or a cursor
       designating an element at such an index value) less than or equal
       to the index value of the element designated by the cursor;
     * The vector that contains the element it designates has been
       passed to an instance of Generic_Sort.

and:

   A Cursor value is *invalid* if any of the following have occurred
   since it was created:
     * The vector that contains the element it designates has been
       finalized;
     * The vector that contains the element it designates has been
       used as the Source or Target of a call to Move;
     * The element it designates has been deleted.

   The result of "=" or Has_Element is unspecified if it is called with
   an invalid cursor parameter. Execution is erroneous if any other
   subprogram declared in Containers.Vectors is called with an
   invalid cursor parameter, or if the cursor designates an element in
   a different vector object than the appropriate one specified in the
   call.

   AARM Notes:
   The list above (combined with the bounded error cases) is
   intended to be exhaustive. In other cases, a cursor value
   continues to designate its original element. For instance,
   cursor values survive the appending of new elements.
   End AARM Notes.

Cursors are not permitted to become ambiguous or invalid solely because of
internal copying (as a result of automatic extension).

****************************************************************

From: Randy Brukardt
Sent: Wednesday, June 23, 2004  9:43 PM

Right. That's an important property: cursors do not become invalid because
of an action that is outside of the user's control. And memory management in
a container is outside of the user's control.

Resize (I forget the new name we settled on, so I'll use the old one for
now) is purely a performance enhancing routine. The only requirement is that
Size (ditto on the name) returns the value most recently passed into Resize,
or something larger. There's an AARM note suggesting to implementors that
Resize allocate at least the specified memory, but of course that is
untestable and cannot be specified in normative language of the standard.

****************************************************************

From: Simon Wright
Sent: Friday, June 11, 2004  3:03 AM

> possibility.) Reverse_List (which probably should just be called
> "Reverse")

If it wasn't a reserved word!


****************************************************************

From: Matthew Heaney
Sent: Wednesday, June 9, 2004  9:31 AM

****************************************************************

From: Matthew Heaney
Sent: Wednesday, June 9, 2004  9:31 AM

****************************************************************

From: Matthew Heaney
Sent: Wednesday, June 9, 2004  9:31 AM

****************************************************************

From: Matthew Heaney
Sent: Wednesday, June 9, 2004  9:31 AM

****************************************************************

Questions? Ask the ACAA Technical Agent