Version 1.1 of ais/ai-30302.txt

Unformatted version of ais/ai-30302.txt version 1.1
Other versions for file ais/ai-30302.txt

!standard A.17          04-02-13 AI95-00302-04/00
!class amendment 04-01-14
!status work item 04-01-14
!status received 04-01-14
!priority Medium
!difficulty Hard
!subject Container library
!summary
This is a dummy AI created solely to hold the volumious mail on this topic. See AI-302-03 for the actual proposal.
!problem
!proposal
!wording
!example
--!corrigendum A.17
!ACATS Test
!appendix

[Editor's note: For mail earlier than February 8, 2004, see AI-302-3.]

****************************************************************

From: Tucker Taft
Sent: Sunday, February 8, 2004  7:33 AM

I suggest the use of controlled types if you want implicit
levels of indirection in the keys or the elements.  Having the
container worry about storage management issues relating to elements
or keys significantly increases their complexity.  We very much
want these containers to be straightforward to define and use.
They are definitely not the final answer, but more the initial
answer -- the 20% that can handle 80% of the problems.

****************************************************************

From: Marius Amado Alves
Sent: Sunday, February 8, 2004  12:23 PM

>I suggest the use of controlled types if you want implicit
>levels of indirection in the keys or the elements.

That is exactly the problem. The user is forced to control. Waste of
time. And bug prone. The right controlled behaviour is very hard to get.
How many times is Finalize called?

>  Having the
>container worry about storage management issues relating to elements
>or keys significantly increases their complexity.

If you mean inneficiency, no, at least not significantly: see the
variant unit solution. If you mean source code complexity, sure, a bit,
but so what?

>  We very much
>want these containers to be straightforward to define and use.
>They are definitely not the final answer, but more the initial
>answer -- the 20% that can handle 80% of the problems.

With only definite elements I don't believe in the 80% figure. Just
think: don't you need heterogeneous arrays all the time? For class-wide
programming for example? And logical records? And words, texts,
pictures, all sort of variable length stuff?

BTW this is the kind of "resistance" I was talking about. No technical
arguments really. Just a vague downsize whish. The pointer tradition maybe.

****************************************************************

From: Marius Amado Alves
Sent: Sunday, February 8, 2004  12:41 PM

Just to make some things clear. I began championing indefinite elements
long ago. Wrote the proposals. They met the "resistance". I let it be. I
assumed the proposals had been viewed and were rejected. The recent
discussion made me wonder if the proposals had really been seen. So I
stepped in just to make sure. I don't want to discuss the issue itself.
That has been done. See the proposals (my Bases document stored in
alternative 1, my proposed Annexes in alternative 2, discussions in
ASCLWG, CLA and here). When I say I won't rediscuss the issue it doesn't
mean I won't give focused explanations here. I'll be glad to do it.
Thanks a lot.

****************************************************************

From: Tucker Taft
Sent: Sunday, February 8, 2004  4:25 PM

> ...
> >  We very much
> >want these containers to be straightforward to define and use.
> >They are definitely not the final answer, but more the initial
> >answer -- the 20% that can handle 80% of the problems.
> >
> With only definite elements I don't believe in the 80% figure. Just
> think: don't you need heterogeneous arrays all the time? For class-wide
> programming for example? And logical records? And words, texts,
> pictures, all sort of variable length stuff?

But in almost all of these cases, I would not want to be copying
these large objects around.  I (as a user of the abstraction) would want
to control storage allocation of the objects.  That would imply
I would be using access types explicitly, or define an abstraction
which used a controlled type, with perhaps reference counting
of a pointed-to part.

> BTW this is the kind of "resistance" I was talking about. No technical
> arguments really. Just a vague downsize whish. The pointer tradition maybe.

Sorry if my arguments seem vague.  I would be happy to engage
in a long discussion about this design choice.  I would want
the container to take over storage allocation only in the
case where it is "uniquifying" the objects, and I expect
to "leave" the objects in the container indefinitely, and pass
around keys (essentially pointers or ids) for the objects.
The example of the "string table" comes to mind, where in
a word or language processing tool, the first thing you do
is uniquify all the strings, and then only deal with indices
into the string table thereafter.  This sort of table generally
never goes away, and just grows slowly as new unique strings occur.

The string mapping was included precisely for this application,
as it seems important and common.  However, for other cases, we
felt it was better to let the programmer control storage allocation,
so that the amount of allocation, copying, and deallocation of large,
variable-sized objects could be minimized, and most importantly,
under control of the user.

Please don't confuse "resistance" with simply a difference of
opinion.  We spend long hours debating incredible minutiae
in the ARG meetings.  We rarely take the "easy" route.
We may not document our discussions publically as well as we
should, but rest assured we have a vigorous debate.
The minutes of ARG meetings, which tend to be very good relative
to most minutes I have seen, are nevertheless able to document
only the "tip of the iceberg" of the discussion.

****************************************************************

From: Marius Amado Alves
Sent: Sunday, February 8, 2004  6:55 PM

Thanks for taking the trouble to review this issue. I'll try to summarize:

You feel the user want to control allocation himself. Sometimes, yes. In
those times, he just does it. The indefinite element feature won't stand
on its way. I feel most of the times the user does NOT want to bother
with memory management. He will love to have indefinite elements. I
think this is the principal difference between us. You think all or most
users prefer to control allocation themselves. I'm conviced they don't,
and they'd be really happy not to have to.

You fear loss of efficiency due to copying. Containers are by-reference,
so you must be referring to copying of elements. But doesn't that happen
just exactly when it has to, be it in the library or in the user code?
Assuming a well designed library, one which moves only references, not
the things, as you yourself notice. I've done proof-of-concept
implementations of this for alternative 2. The process and associated
discussion with Matt was recorded on the ASCLWG forum. The code is still
online I think, but needs cleansing.

****************************************************************

From: Jeffrey Carter
Sent: Monday, February 9, 2004  1:01 AM

Randy Brukardt wrote:
>
> Huh? You've said, in effect, that the performance isn't good enough
> for applications where the performance doesn't matter. That's a
> pretty goofy statement!

Actually, you originally said something like that. You have said

1. That the vector component should only be used by applications where
performance doesn't matter.

2. That the difference in performance between possible implementations
of vector may be critical to applications that use it.

If performance doesn't matter to these applications, then the
restriction on implementations should be removed. However, I agree with
you that even applications that are suitable for the use of standard
components may find the performance difference between different
implementations critical.

> The problem I see is a lot of people are looking far too closely at
> tiny pieces of abstractions.  You might have a queue or a list as
> part of a large abstraction, but they're pretty much useless by
> themselves. And given that creating a queue or stack (both of which
> have only two operations, both trivial!) would take 3 minutes max, it
> makes no sense to use a complex (and necessarily slow) container
> library for just that -- indeed, it probably would be more work to
> use a container than the 3 minutes.

I have seen a number of these "3-min" structures, and many of them have
subtle errors. These are not beginner mistakes, either; handling dynamic
structures seems to be something that a segment of developers have
difficulty understanding. That these structures are not as easy to
implement as they seem is part of the reason why I think a list
component should be part of a standard library.

Regarding Size and Resize, you wrote:

> That's no different than many of the attributes in Ada, which (if set),
> always return the values that they were set to. But what the compiler does
> with those values is (almost) completely implementation-defined.

There is a difference between a compiler directive and an operation of a
package. The latter must have well defined behavior that is not
implementation defined.

> Huh? Resize tells the container a reasonable size to use; what the container
> does with that information is up to it. Size simply returns that
> information.

What does Size return if Resize has not been called?

This description does not agree with the specification in the proposal.
Size "Returns the length of the internal array." Clearly the
implementation must have something that has a length, independent of the
logical length of the value stored in the vector, for Size to return.

Resize "allocates a new internal array whose length is at least the
value Size". Clearly the implemention must allocate a new something with
a new length. What the container does with the new size is not up to it;
it is specified fairly clearly.

The operations, as specified, are pretty meaningless except for an array
implementation.

If the intention is as you described, then the operations appear to be
useless, and should be eliminated. If the intention is as specified,
then these operations are too tied to the implementation, and should be
eliminated.

> I much prefer the vision of this containers library, where the only
> containers included are those that are large, complex, multi-purpose,
> and have a clear abstraction.

The vision I see seems to be muddied. The containers are poorly named,
poorly specified, and confuse abstractions with their implementations.

My intention is to help assure that Ada has as good a container library
as possible in the time available. I assume that the purpose of
presenting the proposal to the Ada-Comment list is to attract comments
on how it could be improved, and there is time to make such comments and
have them considered. I have invested most of this weekend in describing
specific ways I think they could be improved. In many cases I have
provided concrete suggestions for alternative wording, which I
present here. I hope the result will be useful to the committee.

I have already presented my thoughts on changing the type names used to
be consistent with the rest of the standard. I will use the type names
from the proposal here, however, to avoid confusion.

Vectors

The introductory text to Vectors does not make it clear that this is an
extensible array (EA). After reading the package spec, I initially
thought this was a list, perhaps with an unusual implementation. I doubt
if I am special, so I expect such an interpretation from many readers.
After reading the entire section, I encountered the Implementation
Advice that a vector is similar to an array and realized that this was
an EA. An EA is a useful component that I will be happy to see in the
standard.

However, I think it is a disservice to Ada for readers to have to read
the entire section to know what they're looking at. Borrowing from the
introductory text for Strings.Unbounded, which is a special case of an
extensible array, I suggest something along the lines of: "An object of
type Vector_Type represents an array, indexed by Index_Type with
components of Element_Type, whose low bound is Index_Type'First and
whose length can vary conceptually between 0 and the number of values in
Index_type."

The wording used by Strings.Unbounded should serve as a guide to how to
word the text here. Operations in Strings.Unbounded are defined by
analogy to String operations; operations in Vectors should be defined by
analogy to array operations.

Even with such wording changes, however, it is still going to be
difficult for the reader to find what he wants. Someone looking for
vectors is going to be disappointed to find EAs, and someone looking for
an EA is unlikely to look at something named Vectors. Ada should be able
to do better than that. Extensible_Arrays, Flexible_Arrays, and
Unbounded_Arrays have already been suggested by various people here;
given that we already have Unbounded_Strings, Unbounded_Arrays may be
the best choice.

I am not the first to note that Annex A is one of the most accessible
part of the ARM, and is frequently read by those using the standard
library. It makes sense to recognize this and word these sections as a
users' guide where possible. So, if the ARM gains a mathematical library
of matrices and vectors, we should add to it a comment that those
looking for the kind of vector provided by the STL of C++ or Java's
library should look at package Ada.Containers.Unbounded_Arrays (A.17.2).
In the introductory text to the section, we should mention that an
Unbounded_Array is equivalent to the container called Vector in the STL
of C++ or Java's library (similar to the comment about pointers in 3.10).

Index_Subtype is never used, so it should be eliminated.

Size and Resize were discussed above.

First (Vector) is always Index_Type'First, so it should be a constant.

We iterate over an array A by

for I in A'range loop
    -- use A (I)
end loop;

By analogy, we should iterate over an EA by

for I in First .. Last (EA) loop
   -- use Element and Replace_Element at I
end loop;

Front and Back, therefore, seem to be unnecessary, and may be deleted.
This has the additional advantage that it eliminates concern about
Index_Type'Base needing a greater range than Index_type, and we could
remove the assertion.

Writing prematurely when I thought this was a list, I suggested an
iterator for vectors. I retract that suggestion.

It could be useful to provide an operation to add an item at an index >
Index_Type'Succ (Last (Vector) ) without assigning to the intervening
positions. The component doesn't currently allow this. Possible wording:

procedure Append (Vector   : in out Vector_Type;
                   Index    : in     Index_Type;
                   New_Item : in     Element_Type);

If Index <= Last (Vector), this procedure has the same effect as
Replace_Element (Vector, Index, New_Item).

Otherwise, the length of Vector is extended so that Last (Vector) =
Index, and New_Item is assigned to the element at Index. No value is
assigned to the elements at the new positions with indices in
Index_Type'Succ (Last (Vector) ) .. Index_Type'Pred (Index).

There should be some way to indicate that this last use of "Last
(Vector)" refers to the value before the call. I don't see an easy way
to do that and welcome suggestions.

This leaves the problem that Natural is used for the length of a vector
and the counts of inserted or deleted elements, meaning that index types
with more values than Natural cannot use some index values. This is
avoided in Ada.Text_IO, for example, with a type specific for that purpose.

However, this is really a general problem, and a general solution might
be advisable. There are no predefined modular types in Standard, so we
might want to add

type Maximal_Count is mod implementation-defined;

Maximal_Count'Modulus is the largest power of 2 that may be used as the
modulus of a modular type.

We could add a note that this means Maximal_Count'Modulus =
System.Max_Binary_Modulus, for clarity. I presume it would be
inappropriate to reference System in Standard.

If that's not acceptable, we could add somewhere in the hierarchy,
perhaps in package Ada itself

type Maximal_Count is mod System.Max_Binary_Modulus;

[Would we also like subtype Positive_Maximal_Count?]

New packages could then use Maximal_Count rather than Natural for this
sort of thing. Existing packages could be augmented with parallel
operations that use Maximal_Count.

Maps

Maps is fairly well specified. I think the introductory wording should
again be modified: "The user can insert key/value pairs into a map, and
then search for and delete values by specifying the key. An object of
type Map_Type allows searching for a key in less than linear time."

This is a hashed map and specifies an implementation based on a hash
table. This is appropriate, since a hashed map requires the user to
provide a hash function that is not needed by other implementations.
However, I think the name should reflect this (Hashed_Maps) so that we
don't unnecessarily restrain the existence of other forms of Maps.

Since the exact nature of the underlying hash table is implementation
defined, the user doesn't have the information needed to choose an
appropriate size for it. Size and Resize therefore seem inappropriate. I
can hope that users will realize they lack the information to use them
meaningfully, and never call them.

The initial text after the spec seems unnecessarily restrictive of the
implementation. Since the implementation knows best the details of the
hash table, it should determine the initial size of the table.

I agree with the open issue on Swap. I see little use for this operation
on any of the components.

It seems inappropriate to require Insert to resize the hash table. The
implementation should know best when and how to resize the table.

While it's appropriate to discuss nodes as containers of key/value
pairs, it unnecessarily restricts the implementation to talk of nodes
being allocated and deallocated. It should be adequate to say such
things as "Insert adds a new node, initialized to Key and New_Item, to
Map" and "Delete deletes the node from Map".

I don't understand why the string-keyed maps exist, since they are
equivalent to a map with an unbounded string key. The implementation
would have to store the provided key in an appropriate unbounded string,
or duplicate the functionality of unbounded strings. Duplicated
functionality is a bad idea. Moving the conversions to and from
unbounded strings into a special component doesn't seem worth the added
complexity.

Sorted_Sets

The wording here is similar to that for vectors. The introductory text
does not describe the abstraction that the package implements.
Proceeding to the package spec, the reader will probably be puzzled by
the lack of basic set operations such as union and intersection. The
description of the operations that follows does nothing to alleviate the
confusion. A newcomer to the language may very well wonder what's wrong
with these Ada people. Only at the very end of section do we discover
that this is a structure that provides searching in O(log N) time.

Clearly the choice of Set as the name is confusing and misleading, but
I'm not sure what to suggest as an alternative. Something like
Fast_Search seems to imply that it is an algorithm, not a structure.
Perhaps Sorted_Searchable_Structure would work, but I'm not very happy
with it. Suggestions are welcome.

The introductory text needs to identify what the component is: "An
object of type Searchable_Structure represents a data structure that can
be searched in less than linear time."

Given that this is a searchable structure, the operations seem reasonable.

The descriptions of the operations clearly require an implementation
that performs dynamic allocation and deallocation. This is an
unnecessary constraint on the implementation. A binary search is O(log
N), but is not allowed by the current specification. These descriptions
should be modified along similar lines to the suggestions for maps.

If the package does not use "=" for elements, why does it import it? Why
doesn't the package use "="? It's not clear why it should use
"equivalence" rather then equality.

The package Ger_Keys turns a searchable structure into a map. A
searchable structure is a common implementation of a map. Providing an
alternive implementation of a map seems is fine, provided that the name
indicates that it is a map. Sorted_Map might be a better name.

It's quite easy to implement a map with a searchable structure
component, so it would be better if the map was another component at the
same level as the hashed map. I would have no objection to the standard
specifying that this map be implemented with an instantiation of the
searchable structure component; it would make the specification of the
map easy. The primary justifications for this change are that it allows
the user who wants a map based on a searchable structure to obtain it
with a single instantiation, rather than the two required as it stands,
and it allows both maps to have similar interfaces, which they do not
have with the existing proposal.

I'm glad the proposal recognizes that both searchable structures and
maps based on them are useful components, even if they go to great
efforts to disguise what they are.

This discussion of the searchable structure and the map based on it
seems to indicate a basic design problem with the hashed map component.
A hash table is not trivial to implement correctly. There are uses for
hash tables other than maps. As it stands, the user who wants a hash
table must create one, duplicating the effort performed for the map, and
increasing the likelihood of errors.

Just as both a searchable structure and a map based on it are desirable,
so both a hash table and a map based on it would be a good idea. The
user who requires a hash table but not a map could use one that has been
tested by many users, reducing both effort and likelihood of errors.
Thus I suggest that the hash table be turned into a component. As with
the map based on a searchable structure, I would have no problem with
the standard specifying that the hashed map be implemented using the
hash table component.

If we can only have one of the hash table or the hashed map components,
I would argue for the hash table, since it is easy to implement a map
given a hash table, but difficult to implement a hash table given a map.

Providing maps based on other packages allows the standard to
demonstrate a layered approach to creating abstractions. Since creating
useful abstractions is a basic process in software engineering, perhaps
the idea might rub off on some readers.

If this suggestion is accepted, the library would increase from three to
five: an extensible array, a hash table, a searchable structure, a map
based on the hash table, and a map based on the searchable structure.
That still seems a fairly minimal library, provides the same
functionality as the proposal, and adds some additional useful
functionality without significant extra effort.

****************************************************************

From: Martin Krischik
Sent: Monday, February 9, 2004  5:40 AM

> And, on most implementations, I would expect it to make it *many* times
> slower. (It wouldn't have any effect on Janus/Ada, I don't think, because we
> already have to allocate an element at a time anyway.) I would guess that it
> is that efficiency concern that Matt is responding to. But I'll let him
> respond himself...

Actualy some operation will become faster. Like instet in the midle. Also
append operation which need to extend internal storrage become faster.

At least when the stored data is larger then an access - which should be
80% of the cases.

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  8:39 AM

Randy Brukardt wrote:

> Huh? Resize tells the container a reasonable size to use; what the container
> does with that information is up to it. Size simply returns that
> information.

It returns the value chosen by the implementation, which can be at least
the size specified.


> The only real requirement here is O(1) element access (which prevents the
> use of a straight linked list).

Yes, that is correct: you cannot use a linked list to implement a vector.

Indeed, if a vector container were implemented as a linked list then it
wouldn't be named "vector"; it would be named "linked list" instead.

My original proposal had 3 kinds of sequence containers: vectors,
deques, and (linked) lists.  There were 3 because each has different
time and space properties.

I would have liked having a list container in the final committee
report, since that's the most natural container for use as a queue.  (I
probably use lists more often than any other container, for exactly that
reason.)  But the size of the proposal had to be reduced somehow.


> Janus/Ada will probably use an array of pointers (or possibly array of
> arrays of pointers); we're going to be (implicitly) allocating the elements
> anyway, we might as well do it explicitly and take advantage of that to make
> Insert/Delete/Sort (and any expansions) much cheaper (presuming the elements
> are bigger than scalar types). An array of arrays of pointers is even
> better, because insertion cost is bounded by the maximum size of an array
> chunk -- but there is more overhead and complexity, so I'd like to see some
> real uses before deciding on an implementation.

My reference implementation just uses an unbounded array internally.  It
sounds like you have some other implementation ideas.

I have the maps done, and I'll host the new reference implementation
this morning (Mon, 9 Feb).


> Note that a pure list component has no real opportunity for "better"
> implementations, and indeed, any implementation on Janus/Ada would suffer
> from "double" allocation.

But a list component has O(1) insertion and deletion at any position.  A
vector is O(1) only at the back end.

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  9:09 AM

Martin Dowie wrote:

>>The only sequence container in the proposal is a vector, which doesn't
>>have a passive iterator.  Again, I recommend just using a loop:
>
> I suspect the first thing I will do is add an extra child generic subprogram
> Ada.Containers.Vectors.Iterate! :-)

You might not have to.  Since there seems to be interest, I added the
following two declarations to the reference implementation:


    generic
       with procedure Process
         (Element : in Element_Type) is <>;
    procedure Generic_Constant_Iteration
      (Vector : in Vector_Type);

    generic
       with procedure Process
         (Element : in out Element_Type) is <>;
    procedure Generic_Iteration
      (Vector : in Vector_Type);


The latest version of the reference implementation is available at my
home page:

<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040209.zip>

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  9:14 AM

Martin Krischik wrote:

>>The user can easily code a queue in terms of a Vector (that's one of the
>>uses of Insert!). We dropped the list component because it had an identical
>>interface to the Vector component, but was less flexible (no computed O(1)
>>access).
>
> True enough. But if you wanted a build generic queue on top of the vector the
> tag should not be hidden from view. Otherwise one need to repeat all the
> access methods instead of just renaming the one provided from the parent
> package.
>
> In fact the hidden tag is the one feature which I realey dislike in charles.

You mean the type tag?  The components are tagged because I needed
controlledness for automatic memory management. They are tagged for no
other reason, and Charles is specifically designed using static, not
dynamic, polymorphism.

For the record I don't think it's realistic to use a vector as a queue
anyway, since deletion from the front end of a vector is O(n).

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  9:22 AM

Martin Krischik wrote:

> Passive Iterators should allways provide the fastes mean to iterate over the
> whole container. They should do so by knowing the internals of the container.

That is correct.  A passive iterator will usually beat an active
iterator.  But for a vector it probably doesn't make any difference.

However, the latest reference implementation does have passive iterators
for the vector, that look like this:

    generic
       with procedure Process
         (Element : in Element_Type) is <>;
    procedure Generic_Constant_Iteration
      (Vector : in Vector_Type);

    generic
       with procedure Process
         (Element : in out Element_Type) is <>;
    procedure Generic_Iteration
      (Vector : in Vector_Type);


The latest version of the reference implementation is available at my
home page:

<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040209.zip>

> Of course it only matters in advanced container with B-Trees or AVL-Trees as
> as internal structure. But I have only seen those in IBM's Open Class Library
> (which is far better the the STL).
>
> But there are no advanced containers in AI 302.

The sorted set is implemented using a balanced tree.  The reference
implementation uses a red-black tree, but I suppose an AVL tree would
work too.

The maps are implemented using a hash table.

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  9:30 AM

Stephen Leake wrote:

> What is the rationale for making the Map Key_Type definite, as opposed
> to indefinite? Since an indefinite Key_Type is required for
> Containers.Maps.Strings, why not make that capability available to the
> users?

Because that would punish users that have definite key types.

Also, type String isn't just any indefinite type.  It's an array.

The reference implementation for String_Maps looks like this:

    type Node_Type;
    type Node_Access is access Node_Type;

    type Node_Type (Key_Length : Natural) is
       record
          Key     : String (1 .. Key_Length);
          Element : aliased Element_Type;
          Next    : Node_Access;
       end record;

> I don't see a discussion of this in AI-302-03/01.

There is a paragraph in there explaining why we have a dedicated maps
whose key type is String.


> Another point: Containers.Vectors.Size should return Index_Type'Base,
> and the Size parameter in Resize should also be Index_Type'Base. It's
> confusing to have different types for Size and Index.

No.  The parameter of the Resize operation specifies a hint about the
future length of the container, which is subtype Natural.

> There's also a problem if Natural'Last < Index_Type'Last; you
> can't have a vector that contains every index!

The assumption is that a container will always have fewer the
Integer'Last number of elements.  (On a 32 bit machine that's 4.2
billion values...)

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  9:34 AM

Randy Brukardt wrote:

> We definitely expect that the strings container will use a purpose-built
> data structure for storing strings, not some general indefinite item
> capability. Ways to compactly and efficiently store sets of varying size
> strings are well known and commonly used.

I didn't do anything special here.  The internal node declaration for
String_Maps looks like this:


    type Node_Type;
    type Node_Access is access Node_Type;

    type Node_Type (Key_Length : Natural) is
       record
          Key     : String (1 .. Key_Length);
          Element : aliased Element_Type;
          Next    : Node_Access;
       end record;


I have hosted the latest version of the reference implementation at my
home page:

<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040209.zip>

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  9:49 AM

Randy Brukardt wrote:

>>There's also a problem if Natural'Last < Index_Type'Last; you
>>can't have a vector that contains every index!
>
> Yes, that's a serious problem on Janus/Ada (Integer is 16-bit). However, you
> want the Size and Resize operations to take a numeric type that contains
> zero -- and certainly Index_Type is not that. Index_Type could be a subtype
> of an enumeration type or a subtype of a modular type (neither of which can
> contain zero) or a subtype of an integer type not containing zero.
>
> We had a short, inconclusive discussion about whether the index type ought
> to be range <> rather than (<>) (because enumeration and modular types fail
> the assertion and thus aren't directly usable), but that still doesn't
> guarantee a zero. Moreover, if the integer type has negative numbers, then
> the Length of the vector could be larger than Index_Type'Last.

Clearly, if the container is empty, and Index_Type'Base'First =
Index_Type'First, then evaluation of function Last will raise
Constraint_Error.

The issue is whether elaboration of a vector container object can raise
CE if the Index_Type'Base'First = Index_Type'First.

There's no reason why we should punish users whose generic actual index
subtype has Index_Type'Base'First = Index_Type'First, since they can
always defend against CE like this:

    if not Is_Empty (V) and then Last (V) = X then

In fact my reference implementation doesn't require that
Index_Type'Base'First < Index_Type'First, so the assertion in the spec
is somewhat spurious.

I would prefer to weaken the precondition and allow
Index_Type'Base'First = Index_Type'First, but it's really up to
implementors, because allowing that condition will constrain
implementation choices.

> So I don't see a great solution. I wondered about using "Hash_Type" here (it
> has the correct properties), but that seems like a misuse of the type (and a
> bad idea in a library that most Ada programmers will read - you want to show
> them good style in standard libraries).

As I mentioned in my previous message, Resize specifies a hint about the
future number of elements in --that is, the length of-- the container.
My assumption is that no container will ever have more than Integer'Last
number of elements.

If that assumption is incorrect, then maybe the container can be allowed
to grow internally to more than Integer'Last number of elements, but can
only report a maximum value of Integer'Last.

Subtype Natural is the correct choice for the vector Resize operation.

I think the ARG wants to use Hash_Type for Resize for the maps.  My
reference implementation still uses Natural.

****************************************************************

From: Robert A. Duff
Sent: Monday, February 9, 2004  4:40 PM

> Clearly, if the container is empty, and Index_Type'Base'First =
> Index_Type'First, then evaluation of function Last will raise
> Constraint_Error.

Well, some might think it's clear, but some might think Last returns
First-1, which for a modular type is 'Last.  I'm in favor of making the
Index_Type be "range <>", and also requiring that elaboration of an
instance raise an exception if 'First = 'Base'First.  That would avoid
all these anomalies.

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  6:24 PM

That seems reasonable.  It was questionable whether we really needed

  type Index_Type is (<>);

so maybe these issues will require that

  type Index_Type is range (<>);

This is probably good enough.

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  9:53 AM

Randy Brukardt wrote:

> So, a passive iterator will only be faster in complex containers (where you
> have to separate the  Element and Successor functions). For a Vector (where
> the language already has the needed iteration mechanism built-in), it's
> going to be slower (or, if you're really lucky, the same speed) and it
> certainly is a lot harder to write.
>
> So I think having it on Vector would simply be for consistency; you'd never
> actually use it if you know you're dealing with a Vector.


As I mentioned in one of my previous messages, the reference
implementation now has a passive iterator like this:

    generic
       with procedure Process
         (Element : in Element_Type) is <>;
    procedure Generic_Constant_Iteration
      (Vector : in Vector_Type);

    generic
       with procedure Process
        (Element : in out Element_Type) is <>;
    procedure Generic_Iteration
      (Vector : in Vector_Type);


There seems to be interest in a passive iterators for vectors, so we
might as well include it.

<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040209.zip>

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  10:00 AM

Randy Brukardt wrote:

> At which point, you *equal* the performance of the active iterator. And only
> if *everything* goes right. The OP claimed that the passive iterator would
> always have better performance, and that's certainly not true for the vector
> container. I doubt that it would be true for the Map container, either. It
> could be true for a complex container, but those aren't commonly used.

The vector is arguably a borderline case, but we should just include a
passive iterator.  The latest version of the reference implementation
has them for vectors, too.

<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040209.zip>

For both a (hashed) map and (sorted) set, a passive iterator is likely
to beat an active iterator (other things being equal, of course).

For a map, the reason is that you can just use a loop internally, to
keep track of which bucket you're visiting.  In an active iterator, you
have to compute the hash value again to find the next bucket.

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  10:15 AM

>I suspect the first thing I will do is add an extra child generic
>subprogram Ada.Containers.Vectors.Iterate! :-)

This probably won't be necessary.  I added passive iterators to the
vector reference implementation.

<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040209.zip>

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  10:13 AM

> And, on most implementations, I would expect it to make it *many* times
> slower. (It wouldn't have any effect on Janus/Ada, I don't think, because we
> already have to allocate an element at a time anyway.) I would guess that it
> is that efficiency concern that Matt is responding to. But I'll let him
> respond himself...

The reason is that (in what I imagine is a typical implementation)
allowing the key to be indefinite would have drastic performance
implications.

The internal node of the map reference implementation looks like this:


    type Node_Type;
    type Node_Access is access Node_Type;

    type Node_Type is
       record
          Key     : aliased Key_Type;
          Element : aliased Element_Type;
          Next    : Node_Access;
       end record;

I can declare the key as a record component directly, because the formal
key type is definite.  Were we to allow indefinite key types, then we
would have to do something like:

    type Node_Type;
    type Node_Access is access Node_Type;

    type Key_Access is access Key_Type;

    type Node_Type is
       record
          Key     : Key_Access;
          Element : aliased Element_Type;
          Next    : Node_Access;
       end record;

which implies allocating the key object separately from allocation of
the node itself.  This would unfairly punish users that have a definite
actual key type (as Integer or whatever).

<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040209.zip>

If you want an indefinite key type, then allocate the key object
yourself and instantiate the component using the key access type.  This
shouldn't be a problem since the map object is typically part of some
higher-level abstraction anyway, so you can hide the allocation and map
manipulation from the users of that higher-level abstraction.

See the !examples section of the proposal for more details.

****************************************************************

From: Simon J. Wright
Sent: Monday, February 9, 2004  11:37 AM

> The internal node of the map reference implementation looks like this:

Does the aliasing of Element carry any implications for Element_Type?
I am thinking of the use of discriminated types, even with defaulted
discriminants, where aliasing forces the object to be constrained.

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  11:48 AM

It means you can't instantiate the container using a
default-discriminated element type.

This is the same problem you have when trying to declare a
default-discriminated record on the heap, or as aliased on the stack.

The solution in all cases is to use a wrapper type sans discriminant,
and instantiate the component using the wrapper type as the element type.

****************************************************************

From: Robert A. Duff
Sent: Monday, February 9, 2004  2:40 PM

This seems like a real issue.  Either the AI needs to specify that
default-discriminated record "don't work", as it were, or the
implementation needs to do the record-wrapping.

Tucker and I have run into this issue in our current project (I think I
wrote a container package, and Tucker instantiated it like that!), and it
wasn't entirely obvious what the best solution was.

****************************************************************

From: Gary Dismukes
Sent: Monday, February 9, 2004  2:49 PM

> It means you can't instantiate the container using a
> default-discriminated element type.

Not stated quite right -- you can instantiate the container with
such a type, but it might not work right.  You might get mysterious
exceptions propagating out of operations if the implementation
reassigns to an Element component in a node.

> This is the same problem you have when trying to declare a
> default-discriminated record on the heap, or as aliased on the stack.
>
> The solution in all cases is to use a wrapper type sans discriminant,
> and instantiate the component using the wrapper type as the element type.

I think that's not an acceptable answer in this case.  These aliased
element components are part of the implementation.  The user shouldn't
need to know about them and it's an abstraction violation in my opinion
if the user is forced to wrap his element type.  Instead it would seem
that the implementation has to do that wrapping.  Ugly, but at least
it keeps the ugliness internal to the container implementation.

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  2:57 PM

>>The solution in all cases is to use a wrapper type sans discriminant,
>>and instantiate the component using the wrapper type as the element type.
>
> This seems like a real issue.  Either the AI needs to specify that
> default-discriminated record "don't work", as it were, or the
> implementation needs to do the record-wrapping.

The problem is that the element type is aliased.  Wrapping it internally
won't work because Generic_Element returns an access object that
designates the element, not the wrapper.

You can't satisfy both conditions simultaneously.  Personally I find
in-place modification of elements much more useful than being able to
store (unwrapped) default-descriminated elements.

One compromise solution is to only disallow instantiation of
Generic_Element, rather than the whole package, if the element type has
a default-discriminant.  But I don't know whether this is possible
within the language.

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  9:31 AM

>>The solution in all cases is to use a wrapper type sans discriminant,
>>and instantiate the component using the wrapper type as the element type.
>
> I think that's not an acceptable answer in this case.  These aliased
> element components are part of the implementation.  The user shouldn't
> need to know about them and it's an abstraction violation in my opinion
> if the user is forced to wrap his element type.  Instead it would seem
> that the implementation has to do that wrapping.  Ugly, but at least
> it keeps the ugliness internal to the container implementation.

That won't work.  Generic_Element returns an access value that
designates an object of type Element_Type, not the internal wrapper
type.  The problem is that objects of (default-discriminated)
Element_Type can't be aliased, so I'm not allowed to say Element'Access.

Perhaps there is some other solution.  I'm not really sure...

****************************************************************

From: Gary Dismukes
Sent: Monday, February 9, 2004  3:47 PM

Matt Heaney wrote:
>
> That won't work.  Generic_Element returns an access value that
> designates an object of type Element_Type, not the internal wrapper
> type.  The problem is that objects of (default-discriminated)
> Element_Type can't be aliased, so I'm not allowed to say Element'Access.

True, that's a problem.

> Perhaps there is some other solution.  I'm not really sure...

Another solution is to use 'Address and unchecked conversion
to the access type, and forget the aliased component.  This is
starting to look unpleasant though :-(

What we really need is something like Tucker's proposal in AI-363
(eliminating access subtype problems), which would prevent this
pesky aliased problem altogether...

****************************************************************

From: Randy Brukardt
Sent: Monday, February 9, 2004  4:01 PM

Right. And that's still on the table, so there may ultimately be no problem
here for Ada 200Y.

****************************************************************

From: Simon J. Wright
Sent: Tuesday, February 9, 2004  3:16 AM

The Booch Components use Address_To_Access_Conversions for this
precise purpose.

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  4:07 PM

Indeed.  What I was trying to do with Generic_Element is something
similar to what you have in C++:

{
    std::vector<int> v;

    v.push_back(42);

    int& i = v.back();

    ++i;  // i becomes 43
}

The problem is that we don't have references in Ada.  But even so you
can do something like this:

type Integer_Access is access all Integer;

function To_Access is
    new Integer_Vectors.Generic_Element (Integer_Access);

declare
    V : Integer_Vectors.Vector_Type;
begin
    Append (V, New_Item => 42);

    declare
       I : Integer renames To_Access (V, Last (V)).all;
    begin
       I := I + 1;   -- I becomes 43
    end;
end;

This works but the model breaks if the element type has a default
discriminant.

In the case of Integer it is perhaps not necessary to use this
mechanism, but consider if the element of the container is another
container.  You need a variable view of the container element in order
to manipulate it.

I wish there some other way, something like:

   function Element (V : VT) return Element_Type'Reference;
   --in the pseudo vectors pkg

declare
    V : Integer_Vectors.Vector_Type;
begin
    Append (V, New_Item => 42);

    declare
       I : Integer renames Element (V, Last (V));
    begin
       I := I + 1;
    end;
end;

Here Element_Type'Reference is some kind of virtual type that is limited
and indefinite.  The only thing you're allowed to do with the the value
returned by a function that returns T'Reference is to rename it.

But perhaps the ARG has some other, more elegant technique.  Just food
for thought...

****************************************************************

From: Tucker Taft
Sent: Monday, February 9, 2004  5:50 PM

Gary Dismukes wrote:
> ...
> I think that's not an acceptable answer in this case.  These aliased
> element components are part of the implementation.  The user shouldn't
> need to know about them and it's an abstraction violation in my opinion
> if the user is forced to wrap his element type.  Instead it would seem
> that the implementation has to do that wrapping.  Ugly, but at least
> it keeps the ugliness internal to the container implementation.

I agree.  Just declare a local record type that wraps the
user's type.  And/or hope that the AI that solves this
problem gets accepted.

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  10:18 AM

Tucker Taft wrote:

> I suggest the use of controlled types if you want implicit
> levels of indirection in the keys or the elements.  Having the
> container worry about storage management issues relating to elements
> or keys significantly increases their complexity.  We very much
> want these containers to be straightforward to define and use.
> They are definitely not the final answer, but more the initial
> answer -- the 20% that can handle 80% of the problems.

Ahhhh, the voice of reason.  This is exactly right.

If you want indefinite key types, then you pay that privilege, by having
to do the memory management of indefinite keys yourself.  This is how it
should be.

****************************************************************

From: Martin Krischik
Sent: Monday, February 9, 2004  12:40 PM

But you could not even strore a collection of strings. Ok, there are unbounded
strings. But storing 'Class thats the killer feature. If Ada.Containers can't
do it I am not interested. The will be no 20%/80% split. Its 0% - I won't us
them.

****************************************************************

From: Marius Amado Alves
Sent: Monday, February 9, 2004  12:36 PM

Sounds more like the voice of the Devil, or at least De Sade, to me.
"Want indefinite? Go do memory management!" Too much pointer programming
in your minds, dudes. No doubt from much systems programming in your
resumās, but you forget not everybody is a systems programmer. For an
application programmer that 80% figure is just so wrong.

(Matt, "this is exactly right", "this is how it should be"? Assertive is
good but now you're sounding like some God (or Devil). I thought you
were an ateist ;-)

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  12:58 PM

Ada is a low-level systems programming language.  It gives you the tools
to build higher-level abstractions.

If you need to store elements whose type is indefinite, then you have to
build that abstraction yourself, perhaps using the low-level containers
as a substrate.

As Tucker stated, the containers are the starting point, not the ending
point.  Certainly, building the higher-level abstraction is much easier
with the low-level containers than without.

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  12:53 PM

> But storing 'Class thats the killer feature. If Ada.Containers can't
> do it I am not interested. The will be no 20%/80% split. Its 0% - I won't
> use them.

The library is designed around the common case, which means definite key
and element types.

If you want to store elements of type T'Class, that you have to use an
access type to instantiate the component, and then do the memory
management of elements yourself.

This is how it should be.

****************************************************************

From: Pascal Obry
Sent: Monday, February 9, 2004  1:15 PM

 > Ada is a low-level systems programming language.  It gives you the tools
 > to build higher-level abstractions.

As you seem to like strong arguments, let me try this:

This is plain wrong :) Ada is not low-level and certainly not a system
programming language. Ada is an high level language without a specific
domain, this is my point of view.

I find really strange that only Vector is being considered for example. It
would be really useful to have queue, list and stack. Now limiting the
containers to definite types is another restrictions...

The idea behind the Ada containers was to have a common set of useful
components for Ada to avoid reinventing the wheel... So the argument
"If you need to store elements whose type is indefinite, then you have to
 build that abstraction yourself" sounds boggus to me ;)

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  1:26 PM

You can use a vector as a stack.  The library doesn't need to provide a
stack directly.

The library does not provide a list.  I wish it had a list, but the
subcommittee had to reduce the scope of the library and so list didn't
make the cut.

You can use a list as a queue.  The library doesn't need to provide a
queue directly.  However, the library doesn't provide a list, so it
doesn't provide a queue either.

Note that if you need a priority queue, you can use the sorted set.  The
library doesn't need to provide a priority queue directly.


> The idea behind the Ada containers was to have a common set of useful
> components for Ada to avoid reinventing the wheel... So the argument
> "If you need to store elements whose type is indefinite, then you have to
>  build that abstraction yourself" sounds boggus to me ;)

I didn't mean that you have to build the component from scratch.  I
meant only that you have to do the memory management of indefinite
elements yourself.  The higher-level component that you build can be
implemented using the low-level containers.

Real systems are built from the bottom up.  All we did was to provide
the lowest level in the abstraction hierarchy.

****************************************************************

From: Pascal Obry
Sent: Monday, February 9, 2004  2:00 PM

Matthew,

 > You can use a vector as a stack.  The library doesn't need to provide a
 > stack directly.

Except that a stack should have a far more limited set of operations. This
ensure that the stack abstraction is not worked-around.

 > The library does not provide a list.  I wish it had a list, but the
 > subcommittee had to reduce the scope of the library and so list didn't
 > make the cut.

I really think that this should be reconsidered. A list is the most used
abstraction in many software I have built/seen.

 > You can use a list as a queue.

Of course but again this is wrong in my view. The abstraction should be
constrained to the set of operations for a queue. In that case why not remove
the vector, it can be implemented easily with a map, the key is the index of
the item in the array :)

 > Note that if you need a priority queue, you can use the sorted set.  The

This is more high level component, I agree that it is ok to not include it.

If we miss some important components in the standard container library what we
will do ? Use another component library like Charles or PragmArc... an not use
the standard container library... so what the point ????

The most important point in a container library is *completeness* I would
say. This is exactly what STL has done.

****************************************************************

From: Martin Krischik
Sent: Monday, February 9, 2004  12:16 PM

> If you want an indefinite key type, then allocate the key object
> yourself and instantiate the component using the key access type.  This
> shouldn't be a problem since the map object is typically part of some
> higher-level abstraction anyway, so you can hide the allocation and map
> manipulation from the users of that higher-level abstraction.

But Ada hasn't got a garbage collector so there is the deallocation problem.
Especialy when the container copied or passed around.

And Ada (unlike C++) can to better! With Ada you can have a container with
indefinite types where with C++ you can't. We should not give away that
advantage.

****************************************************************

From: Marius Amado Alves
Sent: Monday, February 9, 2004  1:07 PM

> Ada is a low-level systems programming language.  It gives you the
> tools to build higher-level abstractions.

Ok. Thanks for recentring the argument. So your position is that the
standard should not give high-level facilities. Personally I see Ada's
doom in that position. A stillborn Ada 2005.

****************************************************************

From: Pascal Obry
Sent: Monday, February 9, 2004  2:03 PM

Sadly, I feel alike :(

****************************************************************

From: Stephen Leake
Sent: Monday, February 9, 2004  1:56 PM

> If you want indefinite key types, then you pay that privilege, by
> having to do the memory management of indefinite keys yourself.  This
> is how it should be.

Ok. I'd like to see that rationale documented in the final version of
the AI, so people understand why Ada.Containers.String_Map isn't
simply an instantiation of Ada.Containers.Map.

One more argument for indefinite keys; if a C++ person looks at this,
they can say "Ada generics are so weak they can't even allow a String
as a key!". Not good for the "let's attract more users" goal.

And I will continue to use SAL, where the containers do the memory
management, because I like that design point better :).

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  2:13 PM

Stephen Leake wrote:

> One more argument for indefinite keys; if a C++ person looks at this,
> they can say "Ada generics are so weak they can't even allow a String
> as a key!". Not good for the "let's attract more users" goal.

But you can't do that in C++, either.  Indeed, C++ doesn't have
indefinite types so it's unlikely a C++ programmer would even think to
ask that question.


> And I will continue to use SAL, where the containers do the memory
> management, because I like that design point better :).

Real systems are built from the bottom up.  All we did was to provide
the lowest-level in the abstraction hierarchy.

****************************************************************

From: Stephen Leake
Sent: Monday, February 9, 2004  4:20 PM

> But you can't do that in C++, either.  Indeed, C++ doesn't have
> indefinite types so it's unlikely a C++ programmer would even think to
> ask that question.

Hmm. To be specific;

can a C++ STL Map be instantiated with a C++ STL String as the Key?

I'll have to check, but I bet the answer is "yes".

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  6:27 PM

Yes of course an STL map can be instantiated with type std::string as the
key, but that type is analogous to Ada's Unbounded_String, not String.

> I'll have to check, but I bet the answer is "yes".

Yes it can, but you're comparing apples and oranges.

****************************************************************

From: Stephen Leake
Sent: Monday, February 9, 2004  8:36 PM

Ok. And Ada.Containers.Map can be instantiated with Unbounded_String
as the Key. Good enough.

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  2:22 PM

Pascal Obry wrote:

> Except that a stack should have a far more limited set of operations. This
> ensure that the stack abstraction is not worked-around.

Fine.  Then you can implement that stack abstraction yourself, using a
vector as the implementation.

> I really think that this should be reconsidered. A list is the most used
> abstraction in many software I have built/seen.

I think so too, but the subcommittee had to reduce the scope of the
proposal and so lists didn't make the cut.

If you ask for too much then you might not get anything.

> Of course but again this is wrong in my view. The abstraction should be
> constrained to the set of operations for a queue.

Fine.  Then you can implement that queue abstraction yourself, using a
list as the implementation.

>In that case why not remove
> the vector, it can be implemented easily with a map, the key is the index of
> the item in the array :)

That would be an example of "abstraction inversion": using a
higher-level abstraction to implement a more low-level one.

This is the mistake they made in Ada83, requiring that high-level tasks
be used to implement low-level synchronization constructs as semaphores
and monitors.

Ada is a low-level systems programming language.  It is not Perl.

> If we miss some important components in the standard container library what we
> will do ? Use another component library like Charles or PragmArc... an not use
> the standard container library... so what the point ????

Do whatever you're doing now.

The intent of the committee is that this small, modest set of containers
will provide the impetus for a secondary standard.

> The most important point in a container library is *completeness* I would
> say. This is exactly what STL has done.

Well, my original proposal included all the containers in the STL and
then some.  So don't blame me!

****************************************************************

From: Pascal Obry
Sent: Monday, February 9, 2004  2:52 PM

 > Fine.  Then you can implement that stack abstraction yourself, using a
 > vector as the implementation.

Of course, I also can implement every thing myself :)

 > Fine.  Then you can implement that queue abstraction yourself, using a
 > list as the implementation.

Of course, I also can implement every thing myself :)

 > That would be an example of "abstraction inversion": using a
 > higher-level abstraction to implement a more low-level one.

As it is to implement a stack over a vector abstraction.

 > Ada is a low-level systems programming language.  It is not Perl.

It is not Perl, but it is not either a low-level systems programming
language :) And yes I'll keep repeating this :)

 > Do whatever you're doing now.

But I don't !!! That's the whole point of the container library.

 > The intent of the committee is that this small, modest set of containers
 > will provide the impetus for a secondary standard.

Ok. That's a point.

 > > The most important point in a container library is *completeness* I would
 > > say. This is exactly what STL has done.
 >
 > Well, my original proposal included all the containers in the STL and
 > then some.  So don't blame me!

I know Matthew and I want to thanks you for the hard work. I just expected a
bit more so I'm frustrated :)

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  2:00 PM

Martin Krischik wrote:

> But Ada hasn't got a garbage collector so there is the deallocation problem.
> Especialy when the container copied or passed around.

You are responsible for memory management of the indefinite elements.
Implement your high-level abstraction using the low-level container,
instantiated with an access type.

> And Ada (unlike C++) can to better! With Ada you can have a container with
> indefinite types where with C++ you can't. We should not give away that
> advantage.

There is only a slight difference here between Ada95 and C++.  In Ada95
you can do this:

    procedure Insert (C : in out CT; E : in ET) is
       EA : constant ET_Access := new ET'(E);
    begin
       ...

This will work even if ET is indefinite.

In C++ the type has to have a clone operator or whatever:

    void insert(const e_t& e)
    {
       e_t* const pe = e.clone();
       ...
    }

Internally the components wouldn't be any different.

****************************************************************

From: Stephen Leake
Sent: Monday, February 9, 2004  2:04 PM

Matthew Heaney <mheaney@on2.com> writes:

> Stephen Leake wrote:
>
> > What is the rationale for making the Map Key_Type definite, as opposed
> > to indefinite? Since an indefinite Key_Type is required for
> > Containers.Maps.Strings, why not make that capability available to the
> > users?
>
> Because that would punish users that have definite key types.

Can you elaborate on this? I don't see it.

> Also, type String isn't just any indefinite type. It's an array.
>
> The reference implementation for String_Maps looks like this:
>
>     type Node_Type;
>     type Node_Access is access Node_Type;
>
>     type Node_Type (Key_Length : Natural) is
>        record
>           Key     : String (1 .. Key_Length);
>           Element : aliased Element_Type;
>           Next    : Node_Access;
>        end record;

Obviously you can optimize a container if you know the specific types
involved. But the standard containers aren't supposed to be about
highly optimized code; they are supposed to be about generally useful
code.

> > I don't see a discussion of this in AI-302-03/01.
>
> There is a paragraph in there explaining why we have a dedicated maps
> whose key type is String.

Yes. It does _not_ say why Ada.Containers.Maps.Key_Type is _not_
indefinite. That's what I'd like to see.

> > Another point: Containers.Vectors.Size should return
> > Index_Type'Base, and the Size parameter in Resize should also be
> > Index_Type'Base. It's confusing to have different types for Size
> > and Index.
>
> No.  The parameter of the Resize operation specifies a hint about the
> future length of the container, which is subtype Natural.

Why is it Natural? Randy pointed out that Index_Type'Base might not
include 0, or even be an enumeral. I'd rather see Index_Type be
specified as a signed integer, including 0, rather than have Size
return a type that is not Index_Type. (SAL makes this choice).

> > There's also a problem if Natural'Last < Index_Type'Last; you
> > can't have a vector that contains every index!
>
> The assumption is that a container will always have fewer the
> Integer'Last number of elements.  (On a 32 bit machine that's 4.2
> billion values...)

And that assumption is precisely the problem. On systems where
Integer'Last is 2**15, you can't have large containers. Ada must not
make such assumptions!

****************************************************************

From: Stephen Leake
Sent: Monday, February 9, 2004  2:14 PM

> The internal node of the map reference implementation looks like this:

Ok. That makes sense. I suggest this level of detail be kept in the
Rationale for the Ada.Containers package.

I address this issue in SAL
(http://www.toadmail.com/~ada_wizard/ada/sal.html) by allowing the
user to specify both the Key_Type and the Key_Node_Type, and provide a
function To_Key_Node to go from one to the other. For definite keys,
the types are the same, and To_Key_Node is an inlined null function,
so there is no overhead. For indefinite keys, that function does the
allocation.

Hm. In shared code generics, I guess the "inlined null function" does
not get optimized away. So perhaps this would not be an appropriate
approach for a standard Ada package.

Actually, in SAL, keys are always stored in the Items, so you'll only
see Item_Type, Key_Type, and Item_Node_Type, not Key_Node_Type. But
the principle is the same.

It is more complex to instantiate SAL containers than the proposed
Ada.Containers.Map. But I would argue that it is worth it.

> If you want an indefinite key type, then allocate the key object
> yourself and instantiate the component using the key access type.
> This shouldn't be a problem since the map object is typically part of
> some higher-level abstraction anyway, so you can hide the allocation
> and map manipulation from the users of that higher-level
> abstraction.

Ok. In SAL, I don't have two layers. And I agree with others who say
that Ada should provide a useful container that does "typical" memory
management tasks for you.

But any container is better than none :).

****************************************************************

From: Alexandre E. Kopilovitch
Sent: Monday, February 9, 2004  3:05 PM

Pascal Obry wrote:

> Ada is not low-level and certainly not a system
> programming language. Ada is an high level language without a specific
> domain, this is my point of view.

Self-contradictory viewpoint, though - because high level language without a
specific domain and low-level system programming language are roughly the same
thing -:)

> The idea behind the Ada containers was to have a common set of useful
> components for Ada to avoid reinventing the wheel... So the argument
> "If you need to store elements whose type is indefinite, then you have to
> build that abstraction yourself" sounds boggus to me ;)

If we call them "containers" then they should, in some substantial sense,
*contain* things, not just refer to them, So, in this case, they should do
all associated memory management. Otherwise, they aren't Containers, they are
Inventories. It is improper name that confuses the matter and creates heated
argument.

Also, it seems that the library is planned without looking at new features
in Ada2005, particularly, interfaces. I think that this (if true) may be a
serious mistake. Interfaces may provide a way for reconciling different
requirements.

****************************************************************

From: Ehud Lamm
Sent: Tuesday, February 10, 2004  1:04 AM

I would be very happy to see an Ada.Container.Interfaces (or
Ada.Container.Signatures) package/hierarchy, specifying APIs, which could
then be used to achieve (static) polymorphism.
I think this is the palce to provide Stack, Queue interfaces etc. as well.
I think that's a good way to encourage the building block approach.
As far as I recall the workshop we had in Vienna (right?), not many shared
my enthusiasm, alas.

****************************************************************

From: Randy Brukardt
Sent: Tuesday, February 10, 2004  6:53 PM

Alexandre E. Kopilovitch wrote:

...
> Also, it seems that the library is planned without looking at new features
> in Ada2005, particularly, interfaces. I think that this (if true) may be a
> serious mistake. Interfaces may provide a way for reconciling different
> requirements.

I wondered how long it would be before someone asked that question.

I did in fact do some (idle) thinking on that question, and I concluded that
interfaces wouldn't be useful for the containers library.

What you'd like is to be able to write interfaces that describe iteration,
for example, and be able to use those without knowing anything about the
underlying container. Similarly, you could have a sequence interface that
worked with any sequence container.

However, that doesn't really work. The primary problem is that the profiles
of the operations of an interface are fixed other than the object itself.
But, for a container, the operations contain a generic formal type (the
element type), as well as the object type. That means that general
interfaces (like the ones described above), for example) can't be written
that would match any possible element type, only a specific element type
(which is pretty useless).

One way to get around that would be to put the interfaces into the generic
units. But then, the interfaces would only be usable with that container --
hardly a useful interface! You might as well just use the container
directly.

A better way would be to make the element type an interface itself. Then you
could write useful non-generic interfaces. But that would limit the
contained objects to types that can have an interface: tagged types, and
perhaps task and protected types (and of course have the required
interface). That sort of limitation isn't going to fly for the primary
container library - a container of access values is just too common and
important. (I could imagine an O-O offshoot that worked at that way - in a
secondary standard.)

****************************************************************

From: Alexandre E. Kopilovitch
Sent: Tuesday, February 10, 2004  9:45 PM

Randy Brukardt wrote:

> I did in fact do some (idle) thinking on that question, and I concluded that
> interfaces wouldn't be useful for the containers library.
>
> What you'd like is to be able to write interfaces that describe iteration,
> for example, and be able to use those without knowing anything about the
> underlying container. Similarly, you could have a sequence interface that
> worked with any sequence container.

Yes.

> However, that doesn't really work. The primary problem is that the profiles
> of the operations of an interface are fixed other than the object itself.
> But, for a container, the operations contain a generic formal type (the
> element type), as well as the object type. That means that general
> interfaces (like the ones described above), for example) can't be written
> that would match any possible element type, only a specific element type
> (which is pretty useful).

This shows an unpleasant incompatilibity of interfaces with generics. Well,
perhaps "incompatibility" is too strong word for that, but anyway there is
some inconsistence, these notions do not collaborate smoothly. And this is
a general issue, regardless of container library.

> One way to get around that would be to put the interfaces into the generic
> units. But then, the interfaces would only be usable with that container --
> hardly a useful interface! You might as well just use the container
> directly.

Yes, this is clearly a poor way.

> A better way would be to make the element type an interface itself. Then you
> could write useful non-generic interfaces. But that would limit the
> contained objects to types that can have an interface: tagged types, and
> perhaps task and protected types (and of course have the required
> interface). That sort of limitation isn't going to fly for the primary
> container library - a container of access values is just too common and
> important.

I don't understand the latter sentence - I thought that access to interfaces
is permitted... I'm looking at the last example in AI-251 (under the line
"A somewhat less artifical example") - there is type Object_Reference, which
is access to interface type Monitored_Object'Class, and this Object_Reference
is used for parameters of procedures Register and Unregister.

And if you meant that those access values may point to untagged types then
I think that "boxing" those untagged types will not significantly annoy a
programmer.

But anyway I don't think that this way is generally better. It artificially
pushes a containter in position of "controlling object", which isn't a good
thing. And it often convolutes thinking... seems no better than typical C++
puzzles, a maintainer's hell.

****************************************************************

From: Randy Brukardt
Sent: Wednesday, February 10, 2004 11:03 PM

> I don't understand the latter sentence - I thought that access to interfaces
> is permitted... I'm looking at the last example in AI-251 (under the line
> "A somewhat less artifical example") - there is type Object_Reference, which
> is access to interface type Monitored_Object'Class, and this Object_Reference
> is used for parameters of procedures Register and Unregister.

Yes, but access types themselves are not tagged. What they point at is
irrelevant. If you have a formal "type T is tagged private;" no access type
will match that; it's the same for interfaces.

You could of course wrap the access type in a tagged record, and give the
interface to that, and then the element type could be that. But then you have
an extra component name in every use, which is annoying.

For lower-level uses, having a vector/sequence of pointers or a map of pointers
certainly sounds useful and common; forcing wrapping is not going to win any
style points.

****************************************************************

From: Robert A. Duff
Sent: Monday, February 9, 2004  4:28 PM

Regarding support for indefinite keys,
Martin Krischik said:

> But you could not even strore a collection of strings. Ok, there are
> unbounded strings. But storing 'Class thats the killer feature. If
> Ada.Containers can't do it I am not interested. The will be no 20%/80%
> split. Its 0% - I won't us them.

How about this: you write a package that supports the indefinite case,
and you build it on top of the (currently proposed) standard package
that supports only definite?  The definite-only package takes care of
the hashing or whatever, and your package takes care of memory
management for the indefinite keys.

Maybe you try to get your package to be a de-facto standard, or a
secondary standard.

The point is, you *can* use the definite-only package, but only
indirectly, via a wrapper package.  The definite-only package isn't
useless; it does *part* of the job you desire.  This seems like a better
design than making a single package that supports both, and somehow
magically optimize the definite cases.

If the RM supports indefinite, I claim it should do so by providing two
separate packages.  But we're trying to minimize the size of all this,
so we choose just the lower-level one of those.

Yeah, it would be nice if the RM provided both...

****************************************************************

From: Randy Brukardt
Sent: Monday, February 9, 2004  5:36 PM

These seem like an ideal candidate for the hoped-for containers secondary
standard.

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  11:09 AM

Randy Brukardt wrote:

> If we want an array sort, we should declare one:
>
>     generic
>        type Index_Type is (<>);
>        type Element_Type is private;
>        function "<" (Left, Right : Element_Type) return Boolean is <>;
>        type Array_Type is array (Index_Type) of Element_Type;
>     procedure Ada.Generic_Sort (Arr : in out Array_Type);
>
> (We'd need an unconstrained version, too.) But keep it separate from the
> Vector one (or any List one, for that matter).

I added a the following library-level declarations to the latest
reference implementation:

AI302.Containers.Generic_Sort_Constrained_Array
AI302.Containers.Generic_Sort_Unconstrained_Array
AI302.Containers.Generic_Sort

The latter works for any sequence having a random-access iterator, um, I
mean cursor.

<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040209.zip>

They're all basically the same: a simple quicksort using a median-of-3
to choose a pivot.

The Generic_Sort for the vector is implemented as an instantiation of
the generic sort for arrays.

<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040209.zip>

****************************************************************

From: Robert A. Duff
Sent: Sunday, February 8, 2004  12:09 PM

Marius Amado Alves wrote:

> In the meanwhile, there is no requirement that Ada.Containers be
> implemented strictly in Ada, is there?

No.  However, there is "meta requirement" that Ada.Containers be
implementABLE in Ada, and I expect all implementations will be in plain
vanilla Ada without compiler-specific tricks.

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  1:41 PM

The proposal can be implemented in Ada today.  In fact it already is:

<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040209b.zip>

****************************************************************

From: Ehud Lamm
Sent: Tuesday, February 10, 2004 12:58 AM

I agree. I think the meta requirement is the way to go. If there is some good
reason to resort to non-Ada code, it should be allowed, so long as the API is
maintained. BUT, it would reflect badly on the languiage if the only way to
implement this sort of library efficiently would require going outside the
scope of the language. Remember Ada is a general purporse, reuse oriented
language.

One of the reasons I wanted this discussion (and I pushed for a standard
container library back when practically no one wanted to hear...) is that I
think that by working on standard libraries it is easier to focus on areas
where the language needs improvement.
I think this is in fact what's happening right now...

****************************************************************

From: Robert A. Duff
Sent: Monday, February 9, 2004  2:37 PM

Right, and my point was that I want to keep it that way.
I suggest the AI mention this "meta requirement" in its discussion.

Some folks have suggested some sort of compiler-specific "magic" going
on behind the scenes.  I don't want that.

>...  In fact it already is:
>
> <http://home.earthlink.net/~matthewjheaney/charles/ai302-20040209b.zip>

I thank you for your hard work on this.  I haven't had a chance to look
at it yet, though.  What sort of copyright does it have?  Can the
various implementers just take your code and use it as their
implementation of this AI?

****************************************************************

From: Matthew Heaney
Sent: Monday, February 9, 2004  2:46 PM

Yes.  That was the intent.

We can attach any copyright necessary to allow implementors or anyone
else to use it.

Will the GMGPL work?  I'm not an expert on these matters.

****************************************************************

From: Robert A. Duff
Sent: Monday, February 9, 2004  4:36 PM

I suspect the GMGPL would work, but I'm not an expert on these matters,
either.  I suggest you ask Robert Dewar.

****************************************************************

From: Pascal Leroy
Sent: Monday, February 9, 2004  10:41 AM

> I've just posted the report of the containers committee on
> Ada-Comment. The executive summary follows. You can read the
> whole report in the !appendix to AI-00302-3/01, which you can
> find at: http://www.ada-auth.org/cgi-bin/cvsweb.cgi/AIs/AI-20302.TXT
> or you can download the ZIP or tar files from:
> http://www.ada-auth.org/ais.html

Good job.  A few comments after a first perusal:

1 - Insisting on O(N log N) complexity for the sorting algorithm
excludes Shellsort.  This is misguided in my opinion, as Shellsort often
behaves better in practice that Quicksort (in particular, if the input
file is nearly in order).

2 - I would really like it if the definition of containers were written
without a particular implementation in mind.  It's OK to explain that a
Vector is logically an array, but _requiring_ that insertion at the
beginning should take time O(N) is nonsensical!  This is preventing
possibly better implementations.  I have also seen in a mail by Randy
that element access has to be in O(1) (somehow I can't find this in the
AI).  Again, I believe that this is overspecification.  A skip list
would be in my opinion a perfectly good implementation of a Vector, as
in most practical situations the difference between O(1) and O(Log N)
doesn't matter.  But the O(1) requirement precludes a skip list
implementation...

3 - Similarly, I don't understand why the definition of Maps insists on
a hash-based implementation.  I have no problem with the notion that
this generic takes a hash-function, as this can be generally useful
whatever the implementation strategy.  But I don't see why it's
necessary to insist on or expose the details of a hash-based
implementation.  For large maps, a tree-based implementation makes
probably more sense.  We should not prevent such an implementation.
Furthermore, the description seems to require a hash-based
implementation that tries to keep the collision lists reasonably short
(by increasing the number of buckets) and that can lead to very
expensive deallocation/reallocation.

4 - Like others, I don't like the type names ending in _Type (but I
realize that's a matter of taste).  More seriously, I don't like the
usage of the word Vector, as this word is already used by AI 296.  Since
it might make perfect sense to have a vector-302 of vectors-296 (e.g.
successive positions of a mobile) the terminology is only going to cause
confusion among users.  Of all the proposals that I have seen, Sequence
has my preference.  And I don't give a damn what the terminology is in
Java or C++.

****************************************************************

From: Robert Dewar
Sent: Monday, February 9, 2004  11:02 AM

> 1 - Insisting on O(N log N) complexity for the sorting algorithm
> excludes Shellsort.  This is misguided in my opinion, as Shellsort often
> behaves better in practice that Quicksort (in particular, if the input
> file is nearly in order).

Or what about linear sorts like address calculation :-)

> 2 - I would really like it if the definition of containers were written
> without a particular implementation in mind.  It's OK to explain that a
> Vector is logically an array, but _requiring_ that insertion at the
> beginning should take time O(N) is nonsensical!  This is preventing
> possibly better implementations.  I have also seen in a mail by Randy
> that element access has to be in O(1) (somehow I can't find this in the
> AI).  Again, I believe that this is overspecification.  A skip list
> would be in my opinion a perfectly good implementation of a Vector, as
> in most practical situations the difference between O(1) and O(Log N)
> doesn't matter.  But the O(1) requirement precludes a skip list
> implementation...

I agree this is over specified. Also, O(1) is a bit bogus given caches
anyway.

> 3 - Similarly, I don't understand why the definition of Maps insists on
> a hash-based implementation.  I have no problem with the notion that
> this generic takes a hash-function, as this can be generally useful
> whatever the implementation strategy.  But I don't see why it's
> necessary to insist on or expose the details of a hash-based
> implementation.  For large maps, a tree-based implementation makes
> probably more sense.  We should not prevent such an implementation.
> Furthermore, the description seems to require a hash-based
> implementation that tries to keep the collision lists reasonably short
> (by increasing the number of buckets) and that can lead to very
> expensive deallocation/reallocation.

I agree with Pascal here entirely

> 4 - Like others, I don't like the type names ending in _Type (but I
> realize that's a matter of taste).  More seriously, I don't like the
> usage of the word Vector, as this word is already used by AI 296.  Since
> it might make perfect sense to have a vector-302 of vectors-296 (e.g.
> successive positions of a mobile) the terminology is only going to cause
> confusion among users.  Of all the proposals that I have seen, Sequence
> has my preference.  And I don't give a damn what the terminology is in
> Java or C++.

I really think the _Type suffix should be avoided, with few exceptions
it is not at all RM style.

****************************************************************

From: Tucker Taft
Sent: Monday, February 9, 2004  12:37 PM

> 1 - Insisting on O(N log N) complexity for the sorting algorithm
> excludes Shellsort.  This is misguided in my opinion, as Shellsort often
> behaves better in practice that Quicksort (in particular, if the input
> file is nearly in order).

The complexity specifications are intended to set expectations,
without being overly prescriptive.  If there are no shared expectations,
then the containers can end up being frustrating to use.
As usual, the requirement associated with a complexity specification
is that as N => infinity, there is some upper bound on the ratio
between the actual time and the given formula.  We should also
make it clear whether this is for the average case, or the worst case.

> 2 - I would really like it if the definition of containers were written
> without a particular implementation in mind.  It's OK to explain that a
> Vector is logically an array, but _requiring_ that insertion at the
> beginning should take time O(N) is nonsensical!

Clearly we should say "no worse than O(N)".

> ...  This is preventing
> possibly better implementations.  I have also seen in a mail by Randy
> that element access has to be in O(1) (somehow I can't find this in the
> AI).  Again, I believe that this is overspecification.  A skip list
> would be in my opinion a perfectly good implementation of a Vector, as
> in most practical situations the difference between O(1) and O(Log N)
> doesn't matter.  But the O(1) requirement precludes a skip list
> implementation...

I am not an expert on skip lists, but it seems critical to appropriate use
that any element of a vector is "directly addressible".  Random access
is a fundamental part of the abstraction, and if that is not efficient,
it will be very hard to create applications that work reasonably across
implementations.  There needs to be some kind of bound on random
access.  If you believe O(Log N) is acceptable, we can consider that.
For a vector, I personally expect O(1), where the constant factor is *very*
small, and the per-component space overhead ratio is no worse than 100%,
even for byte-sized components.

> 3 - Similarly, I don't understand why the definition of Maps insists on
> a hash-based implementation.  I have no problem with the notion that
> this generic takes a hash-function, as this can be generally useful
> whatever the implementation strategy.  But I don't see why it's
> necessary to insist on or expose the details of a hash-based
> implementation.  For large maps, a tree-based implementation makes
> probably more sense.

Why?  I would have thought just the opposite.  A hashed map can provide
an average case of O(1), and there is nothing precluding using trees
for the few hash buckets that get big.

> ...  We should not prevent such an implementation.
> Furthermore, the description seems to require a hash-based
> implementation that tries to keep the collision lists reasonably short
> (by increasing the number of buckets) and that can lead to very
> expensive deallocation/reallocation.

I feel like you are arguing both sides of the coin here.  You are objecting
to the behavior while at the same time saying we shouldn't specify it.
If it is clear that this is an abstraction whose performance is no worse
than an extensible hash table, then it is more likely it will be used
appropriately.  By doubling on each expansion, the number of reallocations
can be kept relatively small, and the pieces left behind are generally
just the right size for other growing hash tables.

I suppose you could say that you will implement it in a way that makes
your particular customers happy, but I don't think that is a way to
create a standard.  The goal is portability, not only in terms
of correct execution, but also in terms of reasonable, relatively
predictable performance.

I agree we shouldn't overspecify, but nor should we underspecify.  We
need to specify enough to establish useful, reasonable expectations
for implementors and users, so the container library is not just a
toy, but is actually a useful part of the professional Ada programmer's
toolkit.  We certainly should never discourage implementors from
doing better than the minimal requirements, but nor should we encourage
them to deviate so much from the minimal requirements that they have
effectively created a different abstraction, interfering with
portability.

I see the error bounds specified for the elementary functions as a similar
exercise.  They establish expectations, which reduces confusion and
frustration, and helps make it clear when the language-defined functions
can be used appropriately, and when they can't.

> 4 - Like others, I don't like the type names ending in _Type (but I
> realize that's a matter of taste).  More seriously, I don't like the
> usage of the word Vector, as this word is already used by AI 296.  Since
> it might make perfect sense to have a vector-302 of vectors-296 (e.g.
> successive positions of a mobile) the terminology is only going to cause
> confusion among users.  Of all the proposals that I have seen, Sequence
> has my preference.  And I don't give a damn what the terminology is in
> Java or C++.

Of course you don't give a damn.  But the question is whether other users
who do write significant amounts of code in other languages will appreciate
the effort to be part of the mainstream, rather than always trying to
swim in our own creek, elegant and pure as it may be.

****************************************************************

From: Robert Dewar
Sent: Monday, February 9, 2004  1:10 PM

Tucker Taft wrote:

> The complexity specifications are intended to set expectations,
> without being overly prescriptive.  If there are no shared expectations,
> then the containers can end up being frustrating to use.
> As usual, the requirement associated with a complexity specification
> is that as N => infinity, there is some upper bound on the ratio
> between the actual time and the given formula.  We should also
> make it clear whether this is for the average case, or the worst case.

Big O is not an upper bound, it is a description of asymptotic behavior.
As written, this spec would prohibit a sort whose behavior was
asymptotically linear.

> I am not an expert on skip lists, but it seems critical to appropriate use
> that any element of a vector is "directly addressible".  Random access
> is a fundamental part of the abstraction, and if that is not efficient,
> it will be very hard to create applications that work reasonably across
> implementations.  There needs to be some kind of bound on random
> access.  If you believe O(Log N) is acceptable, we can consider that.
> For a vector, I personally expect O(1), where the constant factor is *very*
> small, and the per-component space overhead ratio is no worse than 100%,
> even for byte-sized components.

What does constant factor mean here? A typical implementation of arrays
will have extreme variable behavior depending on caching. A naive model
in which all access is constant time is unrealistic in any case.

> Why?  I would have thought just the opposite.  A hashed map can provide
> an average case of O(1), and there is nothing precluding using trees
> for the few hash buckets that get big.

I personally think that any comments about performance should be
implementation advice, not requirements. You will get into all kinds
of formal mess if you try to make them requirements, but as IA they
are fine and comprehensible.

> Of course you don't give a damn.  But the question is whether other users
> who do write significant amounts of code in other languages will appreciate
> the effort to be part of the mainstream, rather than always trying to
> swim in our own creek, elegant and pure as it may be.

To me, sequence *is* more mainstream than vector. The latter phrase
comes with far too much baggage :-)

****************************************************************

From: Stephane Barbey
Sent: Monday, February 9, 2004  1:54 PM

Both IDL and UML (OCL) use "Sequence" for unbounded collections
of ordered elements that allow the same element more than once.

OCL offers Set, Bag, Sequence and Collection.

The Ada mapping to IDL offers a Corba.Sequences.Unbounded
(and Bounded) package that are similar in spirit (and in
specification) to what the Ada.Strings.Bounded and
Unbounded packages provide.

****************************************************************

From: Randy Brukardt
Sent: Monday, February 9, 2004  2:07 PM

> I personally think that any comments about performance should be
> implementation advice, not requirements. You will get into all kinds
> of formal mess if you try to make them requirements, but as IA they
> are fine and comprehensible.

All of the performance "requirements" *are* written as Implementation
Advice. There isn't any way that I can think of to make them normative, and
in any case, that would be overspecification.

So, if Pascal wants to ignore them, he can -- he just has to document that
fact.

****************************************************************

From: Robert Dewar
Sent: Monday, February 9, 2004  3:23 PM

OK, sorry, missed this, then I have no objection to any of the
statements, though there is still a bit of over-specification
I would say :-)

****************************************************************

From: Robert A. Duff
Sent: Monday, February 9, 2004  4:27 PM

By the way, I find this discussion somewhat frustrating, because there
are discussions going on in ada-comment, and also on arg.  People are
raising some of the same points on both.  It seems like the ARG should
pay a lot of attention to real users on this issue, but I fear some key
ARG members are not currently listening to ada-comment, and many
ada-comment folks are not seeing the arg mailing list.

Sigh.

Anyway, Pascal Leroy said:

> 2 - I would really like it if the definition of containers were written
> without a particular implementation in mind.  It's OK to explain that a
> Vector is logically an array, but _requiring_ that insertion at the
> beginning should take time O(N) is nonsensical!

I'm responding to Pascal's message, because it makes the point so
clearly, but this is really a more general comment.

This is the *usual* view of language design, and the usual view in the
Ada RM -- we specify the high-level semantics, and not the efficiency of
things.

However, I think for a container library, efficiency properties are the
key issue.

Consider "sequences" -- an ordered sequence of items, which can in
principle be numbered from 1 to N (or 0 to N-1, if the programmer
prefers).  There are many possible implementations of "sequence" --
singly-linked lists, doubly-linked lists with dummy header, growable
arrays, fixed-size arrays, etc.  Programmers choose among those
primarily for efficiency reasons.

Therefore, I think we should be thinking about a secondary standard that
contains a variety of "sequence" packages.  Each should be named
according to the intended implementation, so the programmer can choose
wisely.  We're saying "vector" (meaning "array-based" or "contiguous
hunk of storage") should be the one in the next RM -- but we expect
others, like linked lists.

So I disagree with Pascal above -- I think the container packages
*should* have a particular implementation in mind.  I'll even go further
than Randy, and say that instead of "O(1) access" I really want "a
vector/array-based implementation".

Now, you may say that's overspecification.  Why shouldn't the
implementer choose a "better" implementation?  Well, for containers,
there is no "better" -- they just have different efficiency properties
(better for some uses, worse for others).  As a programmer, I need to
know the underlying implementation.

The language designer cannot know which implementation of sequences is
"better".  Nor can the implementer.  Only the programmer can know.
Therefore, we should not let implementers choose, here.

If one implementer chooses "arrays, deallocated and reallocated when
growing" and the other implementer chooses "skip lists", it's a disaster
-- the programmer has no idea which package to choose.

I say, the vectors package should say (as Implementation Advice) "the
intended implementation is as an array", rather than saying something
about O(1) access.  As others have pointed out, there's really no such
thing as O(1) random access -- if you make the vector big enough, you
will get O(log N) because of cache or paging effects.

Then a secondary standard can define 17 other varieties of "sequence"
that have different efficiency properties.  None is "best" for all
purposes.  However, it is desirable that they all have interfaces that
are as similar as possible.

SUMMARY: Don't let implementers choose the one be-all end-all sequence
package.  We choose a particular sequence implementation
(vectors/arrays) that is useful, and let a secondary standard build all
the others.  Let the programmer choose among them.

****************************************************************

From: Randy Brukardt
Sent: Monday, February 9, 2004  6:01 PM

Robert Duff:

> By the way, I find this discussion somewhat frustrating, because there
> are discussions going on in ada-comment, and also on arg.

Besides Bob's "real user" concerns, I am faced with the aggrevating task of
filing two unrelated threads on the same topic going on at the same time
into the same AI. I fear no one is going to be able to make sense out of the
!appendix section...

...
> So I disagree with Pascal above -- I think the container packages
> *should* have a particular implementation in mind.  I'll even go further
> than Randy, and say that instead of "O(1) access" I really want "a
> vector/array-based implementation".

That's actually what the Implementation Advice says. But of course it is
Implementation Advice, so it has no force: Pascal can use a skip list if he
wants.

Going further than that would be useless and bad for at least some
implementations.

For instance, because of generic code sharing, the implementation of the
Vector type will essentially be an array of pointers. Because of that, I'll
probably implement this as an array of pointers, and use that to eliminate
copying in insert/delete/sort operations. Technically, that would still be a
correct implementation (insert would still be O(N), just the constant would
be a lot lower). But clearly, the ratio of execution times between the
various operations would be quite different for this package than for the
"canonical" implementation.

To avoid that, you'd pretty much have to specify the body of the package.
But even that doesn't really help. Again, looking at Janus/Ada, you're going
to get (implicit) allocations of the elements. So, for a Vector of
elementary, the cost of an Insert operation could be 20 times more than for
a non-sharing implementation. (While for a Vector of a type with an
expensive assignment, it might only be a few percent more.) For most uses of
the container, this difference in performance (which appears because the
unit is generic) is likely to matter more than the O(N) performance.

Of course, this is an extreme example, but it shows that the actual
performance of the container is going to depend heavily on the
implementation no matter what is specified in the standard. So going beyond
O(N) type specifications for key operations doesn't help, and could be
actively harmful (by preventing innovative implementations).

****************************************************************

From: Randy Brukardt
Sent: Monday, February 9, 2004  5:30 PM

Pascal wrote:

...
> 2 - I would really like it if the definition of containers were written
> without a particular implementation in mind.  It's OK to explain that a
> Vector is logically an array, but _requiring_ that insertion at the
> beginning should take time O(N) is nonsensical!  This is preventing
> possibly better implementations.  I have also seen in a mail by Randy
> that element access has to be in O(1) (somehow I can't find this in the
> AI).

For the record, here's the wording from the AI. (I wrote this, Matt wanted
it, but didn't know how to express it. I'm not sure I do either - but I knew
I didn't want to define the O(N) notation...)

  Implementation Advice

  Containers.Vectors should be implemented similarly to an array. In particular,
  the time taken by Append and Element should not depend on the number of
  items in the Vector, and the time taken by Insert or Delete at First of the
  vector should take time roughly proportional to the number of elements in the
  vector.

And you are correct, the last part of the sentence should say "no worse
than" or something like that. (Although I can't think of any implementation
that meets the first part that doesn't also meet the second part exactly -
you can reduce the constant arbitrarily, but it still is proportional to N.)

> 4 - Like others, I don't like the type names ending in _Type (but I
> realize that's a matter of taste).

Our original idea was to avoid the "_Type". However, when I tried to do
that, there were a lot of conflicts with package, subprogram, and parameter
names. In the interests of the getting a report done on time, we wanted to
avoid major surgery to the proposal. (Especially updating the examples would
be painful.) So we stuck with "_Type".

If there is a majority opinion that it is worth going forward with these
packages, and that changing the names would be preferred, then I can spend
the time to do it. But I don't want to spend the ARG's limited resources
doing major changes if all we're going to do it kill the proposal anyway. (I
would hope that no one votes against the proposal solely because they don't
like the names - although such a result wouldn't surprise me.)

****************************************************************

From: Robert Dewar
Sent: Monday, February 9, 2004  6:01 PM

Randy Brukardt wrote:

> If there is a majority opinion that it is worth going forward with these
> packages, and that changing the names would be preferred, then I can spend
> the time to do it. But I don't want to spend the ARG's limited resources
> doing major changes if all we're going to do it kill the proposal anyway. (I
> would hope that no one votes against the proposal solely because they don't
> like the names - although such a result wouldn't surprise me.)

It's always risky to vote for something that is flawed with the
expectation of fixing it.

On the other hand, at least one delegation in Salem that was strongly
in favor of adding the keyword CLASS to the language voted against
JDI's proposal because they did not like the prefix notation (I told
Jean not to mix up the issues, but he did not listen to me). They
were quite dismayed that the proposal failed. So you never know...

(interestingly to wonder what would have happened at Salem if that
delegation had understood how the vote worked and voted their actual
interests, then the vote would have been 3-2 in favor of class X is ...
and the US was posed to follow the winning side, so the eventual vote
would have been 4-2 and who knows what would have happened?)

(sorry to digress, but it's an interesting little piece of Ada trivia
history :-)

****************************************************************

From: Robert Dewar
Sent: Monday, February 9, 2004  6:04 PM

Robert A Duff wrote:

> I'm responding to Pascal's message, because it makes the point so
> clearly, but this is really a more general comment.
>
 > lot's of sensible stuff deleted here
>
> SUMMARY: Don't let implementers choose the one be-all end-all sequence
> package.  We choose a particular sequence implementation
> (vectors/arrays) that is useful, and let a secondary standard build all
> the others.  Let the programmer choose among them.

I find Bob's comments here to make a lot of sense, and I agree with
all of them (yes I know that's a change in position, but I think the
fact that this is IA, and Bob's useful perspective make the difference).

****************************************************************

From: Pascal Leroy
Sent: Monday, February 9, 2004  4:58 AM

Bob chided me:

> By the way, I find this discussion somewhat frustrating,
> because there are discussions going on in ada-comment, and
> also on arg.  People are raising some of the same points on
> both.  It seems like the ARG should pay a lot of attention to
> real users on this issue, but I fear some key ARG members are
> not currently listening to ada-comment, and many ada-comment
> folks are not seeing the arg mailing list.

Sorry, the signal/noise ratio on Ada-Comment is too poor, I admit that I
don't have the patience to read all that stuff, and I didn't want to get
50 replies to my initial message.  Anyway, to avoid confusion, I promise
I will shut up until this topic is discussed face-to-face in Phoenix.

Randy pointed out:

> All of the performance "requirements" *are* written as
> Implementation Advice. There isn't any way that I can think
> of to make them normative, and in any case, that would be
> overspecification.

I realize that they are IA, and that's fine.  I am just arguing that the
advices as written are excluding perfectly good implementations.  Of
course I can ignore them, but that's not a satisfactory answer to me: if
we put them in the RM they should be useful.

Tuck commented:

> I agree we shouldn't overspecify, but nor should we
> underspecify.  We need to specify enough to establish useful,
> reasonable expectations for implementors and users, so the
> container library is not just a toy, but is actually a useful
> part of the professional Ada programmer's toolkit.

I completely agree with this principle.  The performance advices are
only there to prevent "bad" implementation.  They should not constrain
"good" implementations.  For instance, using a bubble sort is a no-no,
but we want an implementer to be able to use heapsort, quicksort or
shellsort (or a combination of the three).  Similarly, a Vector should
not be implemented using a simple linked list, but an array or a skip
list are both valid implementations.

> If you believe O(Log N) is acceptable, we can consider that.

As others have pointed out, O(1) and O(Log N) are hardly distinguishable
in practice, it's only the multiplicative factor that counts, so yes, I
believe that we should allow O(Log N) access for vectors.

Back to Bob:

> This is the *usual* view of language design, and the usual
> view in the Ada RM -- we specify the high-level semantics,
> and not the efficiency of things.
>
> However, I think for a container library, efficiency
> properties are the key issue.

I don't see what makes a container library so different from all the
rest.  Let me draw your attention to the fact that we don't specify
efficiency properties for the string packages, or for the numerics
(including the matrix operations of AI 296).  I know that Bob doesn't do
numerics, but for people who do, the performance of these libraries are
likely to be more critical than that of containers.  In practice what
happens is that they run benchmarks, and talk sternly to their vendor if
they don't like the results.

> Therefore, I think we should be thinking about a secondary
> standard that contains a variety of "sequence" packages.
> Each should be named according to the intended
> implementation, so the programmer can choose wisely.  We're
> saying "vector" (meaning "array-based" or "contiguous hunk of
> storage") should be the one in the next RM -- but we expect
> others, like linked lists.

You are on the right track to kill this proposal with kindness ;-)

> So I disagree with Pascal above -- I think the container packages
> *should* have a particular implementation in mind.  I'll even
> go further than Randy, and say that instead of "O(1) access"
> I really want "a vector/array-based implementation".

But what do you gain if you don't specify the multiplicative factor?  I
have this wonderful implementation of vectors, I swear it's O(1), but
for some reason the multiplicative constant is such that it takes 1 sec
on average to access an element.  This is a Duff-compliant
implementation, but hardly a good one.

Surely you don't want to get into the business of specifying the factor,
right?  Unless of course your target is a MIX computer ;-)

> Now, you may say that's overspecification.  Why shouldn't the
> implementer choose a "better" implementation?  Well, for
> containers, there is no "better" -- they just have different
> efficiency properties (better for some uses, worse for
> others).

The same is true for everything.  For the elementary functions, you have
a trade-off between speed and accuracy.  Which is best?  Depends on the
application.  For the random numbers, there is a trade-off between
speed, size of the generator, and quality of the random numbers.  Again,
there is no better implementation.

> If one implementer chooses "arrays, deallocated and
> reallocated when growing" and the other implementer chooses
> "skip lists", it's a disaster
> -- the programmer has no idea which package to choose.

Either the programmer doesn't care, for instance because they only put a
few elements in the vector, and both implementations are fine (that's
Randy's viewpoint, I think).  Or the programmer does care, and he better
run a simple benchmark with, say, a 10-million-element vector, and see
what happens.

> SUMMARY: Don't let implementers choose the one be-all end-all
> sequence package.  We choose a particular sequence implementation
> (vectors/arrays) that is useful, and let a secondary standard
> build all the others.  Let the programmer choose among them.

SUMMARY: For once, I disagree with just about everything that Bob wrote.

****************************************************************

From: Robert A. Duff
Sent: Tuesday, February 10, 2004  8:35 AM

> Bob chided me:

I didn't mean to chide you in particular.  In fact, I didn't mean to
chide anybody.  I was merely lamenting the fact that there is no forum
where the public (i.e. ada-comment folks) and the arg can discuss the
issue of containers.  Sorry.

> > However, I think for a container library, efficiency
> > properties are the key issue.
>
> I don't see what makes a container library so different from all the
> rest.  Let me draw your attention to the fact that we don't specify
> efficiency properties for the string packages, or for the numerics
> (including the matrix operations of AI 296).  I know that Bob doesn't do
> numerics, but for people who do, the performance of these libraries are
> likely to be more critical than that of containers.  In practice what
> happens is that they run benchmarks, and talk sternly to their vendor if
> they don't like the results.

It seems to me that for most features of the language, either efficiency
doesn't matter all that much, or else it's fairly obvious what the
efficiency properties will be.  I *know* (roughly) how compilers
represent integers, arrays, records, etc.  But there are many wildly
different ways to represent sequences.  I don't want Vectors represented
as skip lists any more than I want built-in arrays implemented as skip
lists.

There are a few cases like this is Ada already.  One example is
size-changing records (i.e. defaulted discriminants).  Some compilers
choose an allocate-the-max-size strategy, and others choose a heap-based
strategy.  The former is unacceptable when the max size is 2**33 bytes.
The latter is unacceptable in real-time systems that don't want heap
allocation, or whenever the extra level of indirection is too costly.
It's not obvious which implementation choice is "better".
If Ada 83 had specified (as a NOTE or whatever) which choice the
language designers expected, use of this feature would have been
much more portable.

As to numerics, I don't know what I'm talking about, but I know that the
numerics annex is full of accuracy requirements.  Isn't the
implementer's goal simply "as fast as possible, given the accuracy
requirements"?  Are there wildly different implementation strategies?
I was under the impression that it's more like, "spend more money,
make the algorithms incrementally faster".

As to matrices, I don't know what I'm talking about there, either,
but don't we want all vendors to use a two-dimensional array?
An implementer that chose a sparse representation wouldn't be
doing any favors, right?

...
> But what do you gain if you don't specify the multiplicative factor?  I
> have this wonderful implementation of vectors, I swear it's O(1), but
> for some reason the multiplicative constant is such that it takes 1 sec
> on average to access an element.  This is a Duff-compliant
> implementation, but hardly a good one.

I trust implementers not to *deliberately* sabottage their products.
But implementers need to understand what's expected of them.
I want to say that for Vectors, an array-based implementation is
expected -- we're not asking for the world's greatest all-purpose
sequence package here; we're asking for growable arrays.

> Surely you don't want to get into the business of specifying the factor,
> right?  Unless of course your target is a MIX computer ;-)

Agreed.

> SUMMARY: For once, I disagree with just about everything that Bob wrote.

Oh, well.  :-(

****************************************************************

From: Pascal Leroy
Sent: Tuesday, February 10, 2004  10:14 AM

> I don't want Vectors represented as skip lists
> any more than I want built-in arrays implemented as skip lists.

But why?  You have to explain why, you cannot just say "I don't want".

When I look at the specification of Vectors, the first implementation
that comes to my mind is to use an array if the vector is not too large,
and dense enough.  If it becomes too large I would probably want to
switch to a skip list implementation: this would avoid the unreasonable
O(N) cost on insertion/deletion.  Similarly if the vector becomes very
sparse (not many active elements), I would switch to a skip list
implementation to save space (and indexing would become a bit more
costly).

Of course the skip list would not store individual elements, but
probably chunks that have sufficiently high density.

Surely there are a number of parameters/threshold to be selected to do
the switch from array to skip list, but they should be easy to select by
graphing the space/time characteristics of each algorithm and looking
for the point where these characteristics intersect.

Incidentally we do this for string search: for small strings we use the
načve algorithm, and when the string becomes large we switch to
Boyer-Moore.  You could call this overengineering, but as a user I don't
see why you would complain.

Now you've got to explain to me what is wrong with this approach.  Let
me say it again: this is the first thought that comes to my mind when I
read the specification of Vectors, so I'd like to be educated.

****************************************************************

From: Robert Dewar
Sent: Tuesday, February 10, 2004  10:23 AM

surely everyone would prefer a skip list if using a contiguous vector
would force page faults for every access to the vector.

I assume the real interest here is speed, not O(1) at the cost of
any constant

Anyway this is only IA :-)

****************************************************************

From: Robert A. Duff
Sent: Tuesday, February 10, 2004  11:11 AM

Pascal wrote:

> But why?  You have to explain why, you cannot just say "I don't want".

I've got nothing against skip lists.  What I really want is
uniformity of efficiency across implementations.  The only way I know
how to achieve that is for the programmer to choose among basic
implementation strategies.

> When I look at the specification of Vectors, the first implementation
> that comes to my mind is to use an array if the vector is not too large,
> and dense enough.  If it becomes too large I would probably want to
> switch to a skip list implementation: this would avoid the unreasonable
> O(N) cost on insertion/deletion.  Similarly if the vector becomes very
> sparse (not many active elements), I would switch to a skip list
> implementation to save space (and indexing would become a bit more
> costly).

OK, now you're talking about a hybrid strategy.  I don't see how the
implementation could know about sizes and densities at compile time, so
I assume what you mean is that the Vector implementation gathers
statistics at run time, and switches among different strategies based on
that information.

The overhead of gathering statistics and checking them at relevant times
is worth it in some cases, and not in others.  All I'm saying is that
only the programmer can make that choice.  In my current project, we use
growable arrays that are almost always quite small.  The above "fancy"
implementation would be inappropriate.

Now if you say the "fancy" implementation is a good one, fine, then the
RM should encourage *all* implementers to use it.  Then I, as a
programmer, can know that I don't want to use the language-defined
Vectors package.  In other cases, I can decide that the language-defined
package is appropriate.  But if I have no idea what the underlying
implementation is, I can *never* use the language-defined (except
perhaps in toy programs that don't care about efficiency, or portability
thereof).

> Of course the skip list would not store individual elements, but
> probably chunks that have sufficiently high density.
>
> Surely there are a number of parameters/threshold to be selected to do
> the switch from array to skip list, but they should be easy to select by
> graphing the space/time characteristics of each algorithm and looking
> for the point where these characteristics intersect.
>
> Incidentally we do this for string search: for small strings we use the
> načve algorithm, and when the string becomes large we switch to
> Boyer-Moore.  You could call this overengineering, but as a user I don't
> see why you would complain.

Well, I suppose I wouldn't complain about that.

> Now you've got to explain to me what is wrong with this approach.  Let
> me say it again: this is the first thought that comes to my mind when I
> read the specification of Vectors, so I'd like to be educated.

There's nothing wrong with that approach (I assume we're talking about
the Vector case, not the string-search case).  But if you choose that
approach, and some other compiler-writer chooses a wildly different
approach, the programmer will be lost.

****************************************************************

From: Randy Brukardt
Sent: Monday, February  9, 2004  6:54 PM

Jeffrey Carter:

> Randy Brukardt wrote:
> >
> > Huh? You've said, in effect, that the performance isn't good enough
> > for applications where the performance doesn't matter. That's a
> > pretty goofy statement!
>
> Actually, you originally said something like that. You have said
>
> 1. That the vector component should only be used by applications where
> performance doesn't matter.
>
> 2. That the difference in performance between possible implementations
> of vector may be critical to applications that use it.
>
> If performance doesn't matter to these applications, then the
> restriction on implementations should be removed. However, I agree with
> you that even applications that are suitable for the use of standard
> components may find the performance difference between different
> implementations critical.

That's what I get for trying to argue a position that I don't believe in.

My position is that performance does not matter for these components.
Period.

However, that's a minority position, and I understand the other argument.

The trouble with including performance is that then you must have enough
container forms to handle the most common performance profiles - that means
at least 4 sequence containers (and probably more - at least bounded and
unbounded forms, and list and vector forms) and similarly at least 8
associative containers, and we simply don't have the manpower to properly
specify such a library.

But in any case, I'm obviously not good at arguing it the current position,
and I'm not going to try anymore.

---

That said, my opinion is that the only container worth having (with no
performance requirements) is a map. The Set isn't sufficiently different,
and no sequence container is worth the effort. And such a container probably
ought to hold indefinite elements. (Performance doesn't matter, remember.)

But that position is a minority position (of one!), and I'm not going to
argue that, either.

****************************************************************

From: Robert A. Duff
Sent: Monday, February 9, 2004  7:13 PM

> That's what I get for trying to argue a position that I don't believe in.

;-)

> My position is that performance does not matter for these components.
> Period.
>
> However, that's a minority position, and I understand the other argument.

I'm afraid that I take the opposite position: efficiency is the key
issue.  I'll take the liberty of reposting my response on arg here:

[Editor's note: This word-for-word repeat of 50+ lines is removed to keep these
comments manageable. You can find it about 600 lines back; look for "This is
the *usual* view of language design..." in a message from Bob on Monday
at 4:27 PM]

> The trouble with including performance is that then you must have enough
> container forms to handle the most common performance profiles - that means
> at least 4 sequence containers (and probably more - at least bounded and
> unbounded forms, and list and vector forms) and similarly at least 8
> associative containers, and we simply don't have the manpower to properly
> specify such a library.

I'm saying we should lead the way toward those 4 or 8, as opposed to
trying to be the last word on "sequences" or "mappings" or etc.

****************************************************************

From: Randy Brukardt
Sent: Monday, February  9, 2004  7:18 PM

Jeffrey Carter wrote:

> Regarding Size and Resize, you wrote:
>
> > That's no different than many of the attributes in Ada, which (if set),
> > always return the values that they were set to. But what the compiler does
> > with those values is (almost) completely implementation-defined.
>
> There is a difference between a compiler directive and an operation of a
> package. The latter must have well defined behavior that is not
> implementation defined.

That's a goofy statement. There are lots of package operations in Ada that
have implementation-defined behavior. Try any file Open or anything in
Ada.Command_Line, for instance.

> > Huh? Resize tells the container a reasonable size to use; what the container
> > does with that information is up to it. Size simply returns that information.
>
> What does Size return if Resize has not been called?

The implementation-defined initial size of the container.

Note that there is still quite a bit of overspecification in some of the
wording. I didn't have the time or energy to rewrite every second line of
Matt's proposal, and it wasn't clear that I had the support of the committee
to do so, either.

> If the intention is as you described, then the operations appear to be
> useless, and should be eliminated.

Why? Giving a container an idea of how many elements it will contain can be
a great efficiency help. But there shouldn't be any specification of what it
will mean.

> The introductory text to Vectors does not make it clear that this is an
> extensible array (EA).

Probably because no one uses such a term! The first time I can recall anyone
talking about extensible arrays in my 25+ years of programming (including my
college courses) was last week. I of course know what is meant because the
words have their conventional meanings, but I doubt that there are many
people out there looking up "extensible array" in an index!

...
> So, if the ARM gains a mathematical library of matrices and vectors,

It already did. See AI-296, already approved. (Note that this is an old Ada
83 standard that has not been widely used - but the fact remains that Ada
has had vectors in the mathematical sense for a long time.)

> However, this is really a general problem, and a general solution might
> be advisable. There are no predefined modular types in Standard, so we
> might want to add
>
> type Maximal_Count is mod implementation-defined;

Adding types to Standard is dangerous, because they hide ones visible via a
use-clause. We're not planning to add anything named to Standard for this
reason. Adding it to Ada could cause trouble if there is a use clause for
Ada in a program. So, I'd suggest such a type be added to Ada.Containers
(next to Hash_Type).

> I don't understand why the string-keyed maps exist, since they are
> equivalent to a map with an unbounded string key. The implementation
> would have to store the provided key in an appropriate unbounded string,
> or duplicate the functionality of unbounded strings.

No, a stringspace implementation would be much better than Unbounded_String
for storing large numbers of strings of unknown length. That's precisely the
idea of this component (and the reason it exists separately).
Unbounded_Strings require many tiny allocations, while a stringspace
implementation requires just one (or a few) larger ones.

...
> This discussion of the searchable structure and the map based on it
> seems to indicate a basic design problem with the hashed map component.
> A hash table is not trivial to implement correctly. There are uses for
> hash tables other than maps. As it stands, the user who wants a hash
> table must create one, duplicating the effort performed for the map, and
> increasing the likelihood of errors.

Huh? What could you do with a separate hash table that you couldn't do with
a map? The hash "buckets" contain *something*, and that something is (or can
be) the same as the map elements.

I suspect that if you try to develop this separate container, you'll end up
with pretty much the same interface as map - so there is no reason for a
separate version.

****************************************************************

From: Jeffrey Carter
Sent: Tuesday, February 10, 2004  12:52 PM

Randy Brukardt wrote:

> That's a goofy statement. There are lots of package operations in Ada
> that have implementation-defined behavior. Try any file Open or
> anything in Ada.Command_Line, for instance.

At least I'm consistent :) I agree. In retrospect, I worded that badly,
using general terms when referring to specifics.

>>> Huh? Resize tells the container a reasonable size to use; what the container
>>> does with that information is up to it. Size simply returns that information.
>
>> What does Size return if Resize has not been called?
>
> The implementation-defined initial size of the container.

OK. Let's see if I understand your position correctly. Resize gives the
implementation a hint about a reasonable size to use, but the
implementation may do whatever it wants, including nothing. Size returns
the actual size of something if Resize has not been called, but the last
size given to Resize if Resize has been called, regardless of what the
implementation does (or doesn't) do with that size.

So it appears that you are saying the implementation is required to keep
track of whether Resize has been called, and to store the size passed to
Resize. That doesn't seem like a very useful requirement to me.

It's fun to argue this kind of thing, but we're really wasting time. My
concern is not really what you think should be required, but what the
proposal actually requires.

> Giving a container an idea of how many elements it will contain can
> be a great efficiency help. But there shouldn't be any specification
> of what it will mean.

That's fine. But the specification of Resize requires that it perform an
allocation. That's primarily why these operations concern me.

Allowing the user to know the current size doesn't seem very useful to
me, but I don't see how it can hurt. Allowing the user to force a resize
does seem unwise.

Resize is an appropriate name for the operation as specified. I expect
an operation named Resize to cause resizing. If we're really talking
about giving the implementation a hint about an appropriate size, then
not only does the specification need to be changed, the name also needs
to be different (perhaps Size_Hint?).

> Note that there is still quite a bit of overspecification in some of
> the wording. I didn't have the time or energy to rewrite every second
> line of Matt's proposal, and it wasn't clear that I had the support
> of the committee to do so, either.

Right, and most of my comments were identifying such areas and
presenting alternative wording. I hope, as such, they are useful. I can
understand you not being able to correct all of these, but if they are
not corrected, the current proposal is unacceptable.

Normally, I would think the original author would be the best person to
make such changes. However, Heaney's response to suggestions that the
proposal could be improved has uniformly been that the proposal is
correct as it stands (although I see that after saying that the vector
doesn't need an iterator, he has now added an iterator to his reference
implementation). Perhaps he is more amenable to requests for
modifications from the select committee.

I do have time at the moment, and am willing to make the effort if that
is desired. The committee needs to ask, since I'm unwilling to waste the
effort.

> Probably because no one uses such a term! The first time I can recall
> anyone talking about extensible arrays in my 25+ years of programming
> (including my college courses) was last week. I of course know what
> is meant because the words have their conventional meanings, but I
> doubt that there are many people out there looking up "extensible
> array" in an index!

Surely I didn't invent the term! I agree with you, though. This is a
case where I'm familiar with the concept and have used versions of it
for decades, but I've never encountered a general name for it, except I
know it's not a vector. By analog to unbounded strings, perhaps
unbounded array is best.

> It already did. See AI-296, already approved. (Note that this is an
> old Ada 83 standard that has not been widely used - but the fact
> remains that Ada has had vectors in the mathematical sense for a long
> time.)

Good. I was not aware of this standard. However, this simply reinforces
my opposition to calling unbounded arrays "vectors".

> Adding types to Standard is dangerous, because they hide ones visible
> via a use-clause. We're not planning to add anything named to
> Standard for this reason. Adding it to Ada could cause trouble if
> there is a use clause for Ada in a program. So, I'd suggest such a
> type be added to Ada.Containers (next to Hash_Type).

OK. This should be useful for more than containers, so I'd like to see
it somewhere higher in the hierarchy, though the most important thing is
to avoid defining such types all over the place, like the Count types in
the IO packages. The odds of a conflict if it's in Ada are small, so I
wouldn't think that would be a problem. If the ARG/committee objects to
putting it in Ada, perhaps there should be a special child package for
such things.

> No, a stringspace implementation would be much better than
> Unbounded_String for storing large numbers of strings of unknown
> length. That's precisely the idea of this component (and the reason
> it exists separately). Unbounded_Strings require many tiny
> allocations, while a stringspace implementation requires just one (or
> a few) larger ones.

In general, a key is added to a map only once, and never modified. Using
Unbounded_String would, therefore, only need one allocation per key, so
I don't see that many tiny allocations are needed. However, you probably
know more about this sort of thing, since compilers need to do this kind
of thing a lot, so I may well be mistaken.

> Huh? What could you do with a separate hash table that you couldn't
> do with a map? The hash "buckets" contain *something*, and that
> something is (or can be) the same as the map elements.

Suppose I want to store Integers in a hash table so I can determine if
I've seen one before. There is no mapping from Integers to anything
else. Yes, I can do that with a map, by providing a dummy type for the
element type, and a dummy value for the element parameters, but that's
an ugly kludge. Defining a map in terms of a hash table is neither ugly
nor a kludge.

> I suspect that if you try to develop this separate container, you'll
> end up with pretty much the same interface as map - so there is no
> reason for a separate version.

A hash table doesn't have an operation to obtain an element given a key,
for example. I could agree that there's no reason for a separate map
given a hash table, since maps are trivial to implement with a hash
table, but ideally I'd like to see both. Ada can do better than
expecting the use of ugly kludges.

****************************************************************

From: Matthew Heaney
Sent: Tuesday, February 10, 2004  1:08 PM

Jeffrey Carter wrote:

> Normally, I would think the original author would be the best person to
> make such changes. However, Heaney's response to suggestions that the
> proposal could be improved has uniformly been that the proposal is
> correct as it stands (although I see that after saying that the vector
> doesn't need an iterator, he has now added an iterator to his reference
> implementation). Perhaps he is more amenable to requests for
> modifications from the select committee.

I said a vector doesn't need an *active* iterator.  My opinion on that
matter hasn't changed: active iterators (aka "cursors") are too
error-prone for (array-based) vectors.

I wasn't sure whether we needed *passive* iterators for a vector, since
Ada already provides a built-in for loop.  However, there has been
interest, and so *passive* iterators were added.

> Suppose I want to store Integers in a hash table so I can determine if
> I've seen one before.

Use a set, not a map.

The latest version of the reference implementation now supports the
stream attributes for containers.

<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040210.zip>

****************************************************************

From: Jeffrey Carter
Sent: Tuesday, February 10, 2004  6:35 PM

Matthew Heaney wrote:
> I said a vector doesn't need an *active* iterator.  My opinion on
> that matter hasn't changed: active iterators (aka "cursors") are too
>  error-prone for (array-based) vectors.
>
> I wasn't sure whether we needed *passive* iterators for a vector,
> since Ada already provides a built-in for loop.  However, there has
> been interest, and so *passive* iterators were added.

The actual things said were:

>> Vector should have an iterator, in addition to allowing the user to
>>  explicitly iterate over the structure.
>
> No.  Vector iterators are fragile, and hence very error prone.
>
> They are fragile because the (logical) internal array gets thrown
> away during expansion, which invalidates the iterator.  It's too hard
> to keep track of whether a vector iterator is still valid, and most
> of the time you end up with a dangling reference.

I was discussing the proposal in AI-302-03, so of course I used its
terminology. I did not mention cursors, nor did you.

You should also look "active" and "passive" up in a good dictionary.
Then perhaps you would discover what they mean, and realize that cursors
are passive and procedures are active.

Precision in terminology is important.

>> Suppose I want to store Integers in a hash table so I can determine
>> if I've seen one before.
>
> Use a set, not a map.

A typical answer: the proposal is perfect, therefore any problem a user
has with it must be with the user, not the library.

Yes, I want a set, but I want a hashed set, not one based on an O(log N)
search, perhaps because I know that with my hash function and expected
distribution of values, I can expect O(1) from a hash table.

****************************************************************

From: Matthew Heaney
Sent: Wednesday, February 11, 2004  9:33 AM

The terms "active iterator" and "passive iterator" are discussed in
section 7.3, Variations on a Theme: Iterators, in his book describing
the original Booch Components library:

Software Components With Ada
Grady Booch
Benjamin/Cummings Publishing Company 1987

p. 157-8: "Basically, there are two approaches to iteration, called
active and passive.  In the active approach, we expose the iterator as a
collection of primitive operations, but, in the passive approach, we
export only a single operation."

p. 158: "We shall first discuss the active iterator.  The iterator can
be considered an object of an abstract data type, characterized by the
following operations: Initialize, Get_Next, Value_Of, Is_Done."

p. 159: "With the passive iterator, rather than exporting the type
Iterator and its associated operations, we instead export a single
generic procedure that is nested in the specification of the queue
component."

The Iterator design pattern (aka "Cursor") is described in:

Design Patterns: Elements of Reusable Object-Oriented Software
Erich Gamma et al
Addison-Wesley Publishing Company 1995

p. 260: "Who controls the iteration? A fundamental issue is deciding
which party controls the iteration, the iterator or the client that uses
the iterator.  When the client controls the iteration, the iterator is
called an external iterator, and when the iterator controls it, the
iterator is an internal iterator. [footnote on p.260: Booch refers to
external and internal iterators as active and passive iterators,
respectively.  The terms "active" and "passive" describe the role of the
client, not the level of activity of the iterator.]  Clients that use an
external iterator must advance the traversal and request the next
element explicitly from the iterator.  In contrast, the client hands an
internal iterator an operation to perform, and the iterator applies that
operation to every element in the aggregate."

The footnote in Gamma was referring to the information in the section
Iteration in Chap. 9 (Frameworks) of:

Object-Oriented Analysis and Design with Applications, 2nd ed
Grady Booch
Benjamin/Cummings Publishing Company 1994

p. 356: "For each structure, we provide two forms of iteration.
Specifically, an active iterator requires that clients explicitly
advance the iterator; in one logical expression, a passive iterator
applies a client-supplied function, and so requires less collaboration
on the part of the client. [footnote on p. 356: Passive iterators
implement an "apply" function, an idiom commonly used in functional
programming languages.]"


Section 8.3.6 (Iterators) of the Ada95 Quality and Style Guide explains
the difference between active iterators and passive iterators as
follows:

"The terms active and passive are used to differentiate whether the
iteration mechanism (i.e., the way in which the complex data structure
is traversed) is exposed or hidden. A passive iterator hides the
traversal (e.g., looping mechanism) and consists of a single operation,
iterate, that is parameterized by the processing you do on each element
of the data structure. By contrast, an active iterator exposes the
primitive operations by which you traverse the data structure (Booch
1987)."

<http://www.adaic.com/docs/95style/html/sec_8/8-3-6.html>


My article at adapower.com, "Iterator and Factory Method Patterns
Combined," describes the difference between an active and passive
iterator as follows:

"There are two kinds of iterators: passive and active. A passive iterator
controls the actual movement within the data structure, and all a client
has to do is supply a procedure to receive each item in turn.

"An active iterator moves the responsibility for movement onto the
client. Unlike a passive iterator, which is essentially just a generic
subprogram, an active iterator is an actual type, with primitive
operations for retrieving the current item and for moving to the next
item in the sequence."

<http://www.adapower.com/alg/activeiter.html>


The "Algorithms and Data Structures I" (CS 131) course at the Dept of
Computer Science of The George Washington University has this to say
about the distinction between passive and active iterators:

"The linked-list package introduced in Section 8.2 provides an operation
called Traverse, which moves through the list, from beginning to end,
one element at a time, until each element has been "visited" exactly once.

"Formally, this Traverse operation is an example of a passive iterator
operation. An iterator is any operation that iterates through a data
structure one element at a time; we call it passive because the client
program simply calls it once and "stands back" passively while the
iterator roams through the entire structure. In this note, we use the
terms traversal and iteration interchangeably.

"Sometimes an application requires iterating through a structure,
touching each element once, but allowing the client program the
flexibility to decide just when to proceed to the next element. Moving
through a structure in this fashion is called active iteration, because
the client program is actively involved in the process at every step.

Active Iterator Operations: To be actively involved in the iteration,
the client program must execute a loop. We know that any loop must
contain statements for loop initialization, termination, and
incrementation; to support active iteration, the data structure package
must provide these operations, and also one for retrieval of the current
element in the traversal."

<http://www.seas.gwu.edu/~csci131/fall01/active-traversals.html>


The "Advanced Object-Oriented Design & Programming" (CS 635) at San
Diego State University says this about passive iterators:

"Neither Java nor C++ support passive iterators. Smalltalk does support
them. In a passive iterator, you pass a method or function to the
composite object, and the object then applies the method to all elements
in the object."

<http://www.eli.sdsu.edu/courses/spring01/cs635/notes/object/object.html>

In the topic "Generic Programming: Iterators" in the CS 412/512 course
at Old Dominion University, section 1.1 defines passive and active
iterators this way:

"Iterators can be:

o passive: we pass a function to the iterator and tell it to apply the
function to each item in the collection

o active: we ask the iterator to give us items, and each time it does,
we apply the desired function to it."

<http://www.cs.odu.edu/~zeil/cs412/Lectures/09iterators/iterators_summary.pdf>


In his description of the Bedrock framework for Macintosh apps, Scott
L. Taylor describes the iterators of the C++ Booch Components as
follows:

"Each structure comes with its own form of an iterator that allows
traversal of items within a structure. Two types of iterators are
provided for each structure, passive and active. Passive iterators
require much less interaction on the part of the client. A passive
iterator is instantiated and used by calling the iterator's apply()
method with a function pointer to the function to apply to all the
elements within the structure. Active iterators allow much more
flexibility but require more interaction from the client. Active
iterators must be told to go on to the next item, and the iterator
object returns a reference to each item in the structure for the client
to process or use. Active iterators are very similar to MacApp style
iterators."

<http://www.mactech.com/articles/frameworks/7_6/Booch_Components_Taylor.html>


The iterators of the container classes in the ET++ framework are
described like this:

"There are two types of iterators - passive and active iterators. The
latter provide methods for iterating to be called directly by the client
while with passive iterators the client provides a method to be called
on each element in the container."

<http://swt.cs.tu-berlin.de/~ron/diplom/node58.html>


In "An Overview of the Booch Components for Ada95," the iterators are
described this way:

"There are two forms: active and passive. Active iteration requires the
client explicitly advance the iterator. For passive, the client supplies
a single function "Apply" to work across the structure."

<http://www.rivatech.com/booch/documentation.html>
<http://www.pogner.demon.co.uk/components/bc/documentation.html>

> Precision in terminology is important.

Indeed, and my use of the terms "active iterator" and "passive iterator"
is consistent with the references cited above.

> Yes, I want a set, but I want a hashed set, not one based on an O(log N)
> search, perhaps because I know that with my hash function and expected
> distribution of values, I can expect O(1) from a hash table.

My original proposal had hashed sets.  (It also had sorted maps.)
However, in order to reduce the scope of the change to the language
standard the size of the proposal was reduced, and hashed sets didn't
make the cut.  No one got every container they wanted, not even me.

If you need a hashed set right now, then just grab the hash table from
the reference implementation and assemble it yourself.

<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040211.zip>

****************************************************************

From: Jeffrey Carter
Sent: Wednesday, February 11, 2004 12:56 PM

Matthew Heaney wrote:

> The terms "active iterator" and "passive iterator" are discussed in
> section 7.3, Variations on a Theme: Iterators, in his book describing
> the original Booch Components library:
>
> Software Components With Ada
> Grady Booch
> Benjamin/Cummings Publishing Company 1987

I'm familiar with Booch and the many errors he made in this book. I'm
also aware that many others are unable to think for themselves and have
slavishly followed his lead. I see that you have not looked up "active"
and "passive" and thought about what the phrase "active iterator"
actually means in English. You have simply quoted the errors of others.
Argument by authority is always suspect.

We now have a situation where the terms are actively confusing, and no
one who wants to communicate effectively uses them.

****************************************************************

From: Matthew Heaney
Sent: Wednesday, February 11, 2004  1:41 PM

I don't know what you mean by "glory,"' Alice said.

Humpty Dumpty smiled contemptuously. `Of course you don't -- till I tell
you. I meant "there's a nice knock-down argument for you!"'

`But "glory" doesn't mean "a nice knock-down argument,"' Alice objected.

`When _I_ use a word,' Humpty Dumpty said in rather a scornful tone, `it
means just what I choose it to mean -- neither more nor less.'

`The question is,' said Alice, `whether you CAN make words mean so many
different things.'

`The question is,' said Humpty Dumpty, `which is to be master - - that's
all.'

****************************************************************

From: Marius Amado Alves
Sent: Wednesday, February 11, 2004  2:07 PM

> We now have a situation where the terms are actively confusing, and no
> one who wants to communicate effectively uses them.

Please let's define then:
   - active iterator: use of a Cursor_Type object
   - passive iterator: use of a generic iteration procedure
I hope that's right...

/*
Personally I don't find the active/passive metaphor the most appropriate.
Manual/automatic would be more fitting. Active/passive for me is more
suggestive of program/data and read-and-write/read-only.

Also a confusion was that before alternative 3 "iterator" meant two different
things, namely the cursor and the abstract procedure (use of...). But now it
only means the latter.
*/

****************************************************************

From: Jeffrey Carter
Sent: Wednesday, February 11, 2004  5:46 PM

Marius Amado Alves wrote:
>
> Please let's define then:
>    - active iterator: use of a Cursor_Type object
>    - passive iterator: use of a generic iteration procedure
> I hope that's right...

No, the proposal has this right:

Cursor : a value that indicates a specific element in a container.
Iterator: a procedure that applies an action to each element in a
container in turn.

****************************************************************

From: Matthew Heaney
Sent: Thursday, February 12, 2004  9:38 AM

>    - active iterator: use of a Cursor_Type object

Yes.

>    - passive iterator: use of a Generic_Iteration procedure

Yes.

> I hope that's right...

Yes, that's correct.

> Also a confusion was that before alternative 3 "iterator" meant two different
> things, namely the cursor and the abstract procedure (use of...). But now it
> only means the latter.

An iterator is a mechanism for visiting elements in a container.  There
are two kinds of iterators: "active" iterators and "passive" iterators.

****************************************************************

From: Stephen Leake
Sent: Tuesday, February 10, 2004  2:45 PM

Jeffrey Carter <jrcarter@acm.org> writes:

> Allowing the user to know the current size doesn't seem very useful to
> me, but I don't see how it can hurt. Allowing the user to force a resize
> does seem unwise.

The user could run her application for a while, then query the current
size of the map and store it in a config file. Then, when the
application starts the next time, it reads the required size from the
config file, and calls Map.Resize. The intent is that this allows the
application to avoid all the resizes on the second run.

****************************************************************

From: Matthew Heaney
Sent: Tuesday, February 10, 2004  3:00 PM

That's one (clever) application of Resize.  The intent is that if you
know a priori what the ultimate number of elements will be, then this
avoids any expansion during insertion.  Insertion behavior is thus more
uniform.

See the examples in ai302/hash and ai302/hash2 in the reference
implementation for more ideas.

<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040210.zip>

****************************************************************

From: Jeffrey Carter
Sent: Tuesday, February 10, 2004  6:40 PM

I think I was talking about vectors.

Length is sufficient for this. The main problem is that the
specification prohibits some implementations: Resize is specified as
requiring an allocation, which may not be appropriate for some
implementations. Size_Hint, with no requirement what the implementation
does with the value, is more appropriate.

****************************************************************

From: Randy Brukardt
Sent: Wednesday, February 11, 2004 10:37 PM

Jeffrey Carter wrote:

(Sorry, I missed this yesterday.)

...
> OK. Let's see if I understand your position correctly. Resize gives the
> implementation a hint about a reasonable size to use, but the
> implementation may do whatever it wants, including nothing. Size returns
> the actual size of something if Resize has not been called, but the last
> size given to Resize if Resize has been called, regardless of what the
> implementation does (or doesn't) do with that size.
>
> So it appears that you are saying the implementation is required to keep
> track of whether Resize has been called, and to store the size passed to
> Resize. That doesn't seem like a very useful requirement to me.

Yup. That's precisely how Type'Size works in Ada; it has a fairly weak
effect on Obj'Size, but in any case, if you set it, you have to return the
same value (even if that value has nothing to do with how objects are
actually stored).

...
> Resize is an appropriate name for the operation as specified. I expect
> an operation named Resize to cause resizing. If we're really talking
> about giving the implementation a hint about an appropriate size, then
> not only does the specification need to be changed, the name also needs
> to be different (perhaps Size_Hint?).

I don't see a strong need to change the name, but I do agree with you that
there shouldn't be a *requirement* to do some allocation.

...
> > No, a stringspace implementation would be much better than
> > Unbounded_String for storing large numbers of strings of unknown
> > length. That's precisely the idea of this component (and the reason
> > it exists separately). Unbounded_Strings require many tiny
> > allocations, while a stringspace implementation requires just one (or
> > a few) larger ones.
>
> In general, a key is added to a map only once, and never modified. Using
> Unbounded_String would, therefore, only need one allocation per key, so
> I don't see that many tiny allocations are needed. However, you probably
> know more about this sort of thing, since compilers need to do this kind
> of thing a lot, so I may well be mistaken.

One allocation per key is a lot more than one allocation per *map*, which is
what a stringspace implementation takes. (Well, it might have to expand if
it gets full, but that should be rare. It could degrade to one allocation
per key if the keys are very, very long, but some care in implementation
should prevent degrading.)

> > Huh? What could you do with a separate hash table that you couldn't
> > do with a map? The hash "buckets" contain *something*, and that
> > something is (or can be) the same as the map elements.
>
> Suppose I want to store Integers in a hash table so I can determine if
> I've seen one before. There is no mapping from Integers to anything
> else. Yes, I can do that with a map, by providing a dummy type for the
> element type, and a dummy value for the element parameters, but that's
> an ugly kludge. Defining a map in terms of a hash table is neither ugly
> nor a kludge.

I have a component like that (it's actually Tom Moran's), but in practice,
I've *never* used it without using the index values it provides to manage
some other data in a separate table (at least statistics and/or debugging).
Even the 'known words' list in the spam filter uses the indexes (handles)
for debugging. If that's the case, why bother having to use a separate
component (causing another chance of error)?

So I would guess that the "dummy type" would gain some real data in 95% of
the applications. And that such uses are less than 10% of the uses of a map
anyway. Since this is a minimal library, we're not trying to cover that
remaining 0.5%.

****************************************************************

From: Randy Brukardt
Sent: Monday, February  9, 2004  7:41 PM

Matt Heaney said:

...
> As I mentioned in my previous message, Resize specifies a hint about the
> future number of elements in --that is, the length of-- the container.
> My assumption is that no container will ever have more than Integer'Last
> number of elements.

Ada only requires that Integer'Last is 2**15-1. That's 32767. Do you want to
assume that no container every has more than 32767 elements??

> If that assumption is incorrect, then maybe the container can be allowed
> to grow internally to more than Integer'Last number of elements, but can
> only report a maximum value of Integer'Last.
>
> Subtype Natural is the correct choice for the vector Resize operation.
>
> I think the ARG wants to use Hash_Type for Resize for the maps.  My
> reference implementation still uses Natural.

Wow! I've been promoted to be the entire ARG! :-)

No, I think we should use a purpose-built type for this, just like we did
for hashing (and for the same reasons). I hope we don't repeat the mistake
of Ada.Strings.Unbounded (which, at least has a justification for making
that mistake).

****************************************************************

From: Matthew Heaney
Sent: Tuesday, February 10, 2004  9:19 AM

> Ada only requires that Integer'Last is 2**15-1. That's 32767. Do you want to
> assume that no container every has more than 32767 elements??

I assumed that type Integer corresponded to the "natural" word size of
the machine, and that if Integer were only 16 bits that this portended
other, more invasive resource issues, which precluded very large numbers
of container elements.

But it just goes to show you I don't know very much...

> Wow! I've been promoted to be the entire ARG! :-)

Sorry about that, I should have said "ARG select committee on
containers" but laziness got the better of me.  I'll try to more clear
in the future.

> No, I think we should use a purpose-built type for this, just like we did
> for hashing (and for the same reasons). I hope we don't repeat the mistake
> of Ada.Strings.Unbounded (which, at least has a justification for making
> that mistake).

OK.  But it would be nice if the operators of the length/count/size type
were directly visible at the point where the container instance is
declared, without having to with Ada.Containers too.

****************************************************************

From: Randy Brukardt
Sent: Wednesday, February 11, 2004  9:53 PM

Matt Heaney wrote:

> Randy Brukardt wrote:
>
> > Ada only requires that Integer'Last is 2**15-1. That's 32767. Do you want to
> > assume that no container every has more than 32767 elements??
>
> I assumed that type Integer corresponded to the "natural" word size of
> the machine, and that if Integer were only 16 bits that this portended
> other, more invasive resource issues, which precluded very large numbers
> of container elements.

Never assume about the Standard. :-)

Janus/Ada made the choice of leaving Integer at 16-bits to ease porting of our
many 16-bit customers to our 32-bit compilers. That probably was a bad choice
(because it harms portability of other Ada code to Janus/Ada), but in any case
we're pretty much stuck with it. (Changing would break too much existing code
and especially files.)

3.5.4(21) is the only requirement on the range of Integer; there isn't anything
else, not even Implementation Advice, about going further. If you want
something specific, declare your own.

> > Wow! I've been promoted to be the entire ARG! :-)
>
> Sorry about that, I should have said "ARG select committee on
> containers" but laziness got the better of me.  I'll try to more clear
> in the future.

No, this idea was one that I idly mentioned (and dismissed) a couple of days
ago. I'm pretty sure no one else has talked about it (in either direction).

****************************************************************

From: Jeff Cousins
Sent: Tuesday, February 10, 2004  9:45 AM

Given that the Booch components are now available for free from AdaPower, is
there a pressing need for other containers?

Though having said that, we paid for the Booch components but only found
list_single_bounded_managed, list_utilities_single, heap_sort and quick_sort
to be of much use.

****************************************************************

From: Ehud Lamm
Sent: Tuesday, February 10, 2004 12:44 AM

> If you want to store elements of type T'Class, that you have to use an
> access type to instantiate the component, and then do the memory
> management of elements yourself.
>
> This is how it should be.

I agree with Matt on this one. Especially as regard 'class.
However, I think that strings should be treated as a speciall case. It seems
to me that the easiest approach is to provide a special version of the
packages for this case (a wrapper), which accepts string parameters (and
return type from functions), and uses unbounded internally. This wrapper can
be implemented on top of the basic library (instantiate with unbounded, and
let the wrapper routines simply do the string<->unbounded string
conversions).

One of the good things about having a standard library is that the
restricted component I described and others like it are going to be easy to
create, and share, seeing as they are based on packages all Ada users are
likely to have available. It is not mandatory they themselves be part of the
standard (though in this case I think it would be a valuable addition).

****************************************************************

From: Ehud Lamm
Sent: Tuesday, February 10, 2004 12:52 AM

> The most important point in a container library is *completeness* I would
> say. This is exactly what STL has done.

This is a good point, and keep in mind that I firmly belong to the 80/20 camp.
The reason why this point is well taken is that noone is likely to want to
use 2 (or 3) different contianer libraries inside one application. So the
feature rich library is likely to win over restricted (even standard) ones.
At least when building the _second_ application using such a library...

Howver, I don't think this means adding more stuff to Ada.Containers at this
point. Let's be practical here. Time is short etc. etc.

What should be done, however, is for the community to provide more
components based on the same style (and based on the simple building blocks
that are part of the stadnard lib). Some of these will be adopted into the
core later on, and some will simply coexist nicely with the standard lib
while remaining independent.

****************************************************************

From: Martin Krischik
Sent: Tuesday, February 10, 2004  2:07 PM

Am Montag, 9. Februar 2004 23:28 schrieb Robert A Duff:
> Regarding support for indefinite keys,
>
> Martin Krischik said:
> > But you could not even strore a collection of strings. Ok, there are
> > unbounded strings. But storing 'Class thats the killer feature. If
> > Ada.Containers can't do it I am not interested. The will be no 20%/80%
> > split. Its 0% - I won't us them.
>
> How about this: you write a package that supports the indefinite case,
> and you build it on top of the (currently proposed) standard package
> that supports only definite?

Did that allready - but it based on the booch components.

> The point is, you *can* use the definite-only package, but only
> indirectly, via a wrapper package.  The definite-only package isn't
> useless; it does *part* of the job you desire.  This seems like a better
> design than making a single package that supports both, and somehow
> magically optimize the definite cases.

Agreed, two packages are betten the one. And currently I do the same with the
booch componentes - only I create one from the other with the help of an text
filter instead of using a wrapper.

> If the RM supports indefinite, I claim it should do so by providing two
> separate packages.  But we're trying to minimize the size of all this,
> so we choose just the lower-level one of those.

Maybe the RM should suggest names for extended containers.

****************************************************************

From: Martin Krischik
Sent: Tuesday, February 10, 2004  2:16 PM

Am Montag, 9. Februar 2004 19:52 schrieb Matthew Heaney:

> The library is designed around the common case, which means definite key
> and element types.
>
> If you want to store elements of type T'Class, that you have to use an
> access type to instantiate the component, and then do the memory
> management of elements yourself.
>
> This is how it should be.

If a garbage collector was provided as well: Yes. Otherwise NO!!

There is something wich upsets me great time:

Half the Ada community says: No garbage collector please! - The container
library should do memory managment.

The other half says: No, container libraries should not provide memory
managment.

It would be better for Ada if the Ada comunity make there mind up.

Well since in AdaCL I have both I made my mind up: container libraries with
memory management is more usefull.

****************************************************************

From: Marius Amado Alves
Sent: Tuesday, February 10, 2004  2:55 PM

I just wrote the thing excerpted below.
The whole is available at
  http://www.liacc.up.pt/~maa/containers
Thanks.

-- TRUC : TRUE CONTAINERS
-- by Marius Amado Alves
--
-- Truc is a proof-of-concept implementation of AI-302/3
-- for indefinite elements, i.e. indefinite generic formal
-- element types (reorder the 4 adjectives at will).
--
-- Truc automatically chooses the appropriate implementation
-- for the actual type. Definite actuals select a Charles-like
-- body, whereas indefinite ones select a SCOPE-like one.
--
-- Truc is 100% written in Ada. Some optimizations could be
-- done by going a bit outside the language. This is
-- discussed elsewhere.
--
-- Only the vector variety is implemented.
-- Only a subset of the interface is implemented.

****************************************************************

From: Randy Brukardt
Sent: Tuesday, February 10, 2004  6:25 PM

I'm going back and filing all of these messages about this AI, and I'm
continually seeing statements like:

"If the containers don't have <my pet feature>, I'm not going to use them."

I realize hyperbole as common on mailing lists, but you have to keep in mind
the current situation.

In order to meet the schedule, the ARG needs to complete proposals by the
end of the June meeting, or they're not going to be in the standard. That
reality means that there is not time to develop a significantly different
proposal. (Wordsmithing is different; I expect there to be plenty of
wordsmithing done on this proposal. I certainly hope that some of the
problems noted by Jeff Carter (for instance) are fixed.)

The strategy proposed by the committee was to standardize something like
AI-302-3, and encourage the development of a secondary standard (at a more
leisurely pace!) to handle creating additional containers to provide
additional functionality, both performance related (bounded forms, lists,
etc.), functional (sorted_maps, unsorted_sets, etc.), and operational
(indefinite keys, indefinite elements, limited elements). We hope that
providing a standard root will channel future developments in a common
direction, rather than the scattershot approach that's currently prevalent.

The ARG is going to have to decide either to follow that strategy, or
essentially give up (because there is no time to develop an alternative).

Now, when you say "I won't use it.", you're putting the ARG members into a
spot:

1) Either the ARG has to standardize over the objections of users, "because
we know better"; or

2) Decide that there is insufficient consensus, and forget the proposal.

My feeling about the brief discussion at the San Diego meeting is that some
ARG members view this as an insolvable problem, and would just as soon
forget it (tossing it to some undefined International Workshop Agreement
process). It took a lot of persuading by Tucker (and to a lesser extent, by
me and couple of others) to set up the committee rather than just tossing it
at that meeting.

I fully expect to revisit that at our next meeting. If the discussion here
gives the opponents too much ammunition, there probably won't be a standard
container library in Ada now (and I personally think *ever*).

If that is your true opinion, feel free to express it - I'd rather spend my
time working on something that will likely be in the standard in that case!
But otherwise, I'd suggest cutting down the hyperbole in your messages.

****************************************************************

From: Marius Anado Alves
Sent: Wednesday, February 11, 2004  4:03 AM

They're not hyperboles. Please don't paternize. The wanted features
missing in the proposal had been expressed since long ago, even
formally, and repeatedly till now, and probably people saw this
discussion as a last change to win the "resistance".

It is clear now we must abandon all hope. Thanks for making that clear
at last.

So it's an incomplete library or none at all. I only fear an incomplete
standard can do more harm than good, principally in respect to
attracting new programmers to the language--by creating a bad first
impression. (For what it's worth, I'd say toss it. Those Front and Back
things looked terrible anyway.)

****************************************************************

From: Pascal Obry
Sent: Wednesday, February 11, 2004  4:25 AM

 > So it's an incomplete library or none at all. I only fear an incomplete
 > standard can do more harm than good, principally in respect to
 > attracting new programmers to the language--by creating a bad first
 > impression. (For what it's worth, I'd say toss it. Those Front and Back
 > things looked terrible anyway.)

I tend to agree with Marius here. Especially if the next change is for 5 or
10 years from now ! A good programming language needs to provides a decent
container library today. As I have already said, if the library is not broad
enough it will just not be used, Charles, PragmArc or the Booch components
will be used instead... In this case it is not even necessary to add a set
of standard containers...

Just my 2 cents of course.

****************************************************************

From: Marc A. Criley
Sent: Wednesday, February 11, 2004  7:57 AM

It looks like the frequent situation of no lack of devil's advocates (who
are only trying to make things better), and too few championing angels :-)

The Ada software I develop can be split into two broad categories:
performance critical, and non performance critical.

When writing the latter, NIH (Not-Invented-Here) is a dirty word to me.  I
want to write software fast and right, and I'll happily reuse standard
components, my own stuff that I've got laying around, and whatever utilities
and libraries have been posted on the Internet for free use.

For example, while Unbounded_Strings gets a lot of abuse in Ada discussions,
it and standard strings have pretty much provided all of the string
processing I've ever needed.

I fully expect as well that I'll be extensively employing Ada.Containers
just as soon as they're standardized.

I don't care if there are some purported conceptual weaknesses or omissions,
or if the implementation could be improved--if it provides the functionality
I need, is effectively bug-free, performs "good enough", and better yet is
part of the Ada standard, it gets used.

I don't want to write infrastructure if there's already packages that
provide it.  I don't want to have to select among the pros and cons of six
different home-grown container package collections and then have to concern
myself with whether the developer is going to maintain them, or if I have to
take on the responsibility for that as well.  (I've never concerned myself
with who's maintaing the Ada.Strings hierarchy, but I do now have to
maintain my own version of a particular container collection.)

I want Ada.Containers, and I will use them.  Make them as good and powerful
as you can, and then shut off the discussion and release them.

****************************************************************

From: Martin Dowie
Sent: Wednesday, February 11, 2004  8:22 AM

> I want Ada.Containers, and I will use them.  Make them as good and powerful
> as you can, and then shut off the discussion and release them.

I'd second that.

What is currently proposed is admittedly limited but it
would be useful. If Matt could adapt "Charles" into the
core of the secondary standard that would be great too.

****************************************************************

From: Matthew Heaney
Sent: Wednesday, February 11, 2004  9:36 AM

That is indeed the plan.  The current proposal has only a modest set of
containers but we have to start somewhere.

If you need something right away, there is a reference implementation
available at my home page.

<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040211.zip>

****************************************************************

From: Matthew Heaney
Sent: Wednesday, February 11, 2004  9:44 AM

I think you'll find that in spite of its modest size, the containers in
the current proposal are indeed very, very useful.

See in particular the !examples section in the AI itself.

<http://www.ada-auth.org/cgi-bin/cvsweb.cgi/AIs/AI-20302.TXT?rev=1.1>

The reference implementation contains several examples, too.

****************************************************************

From: Martin Krischik
Sent: Wednesday, February 11, 2004  12:38 PM

> The strategy proposed by the committee was to standardize something like
> AI-302-3, and encourage the development of a secondary standard (at a more
> leisurely pace!) to handle creating additional containers to provide
> additional functionality, both performance related (bounded forms, lists,
> etc.), functional (sorted_maps, unsorted_sets, etc.), and operational
> (indefinite keys, indefinite elements, limited elements). We hope that
> providing a standard root will channel future developments in a common
> direction, rather than the scattershot approach that's currently prevalent.

Ok, you are right there. I can easiely live with "indefinite later" - to name
my pet feature - however some expressed an "indefinite never" stand and I
can't live with that.

****************************************************************

From: Marc A. Criley
Sent: Wednesday, February 11, 2004  2:23 PM

I fear the participants on this list are rather detached from the "average
Ada programmer" experience.

Of all the dozens of Ada programming _coworkers_ I've worked with over the
years, I could count on one hand (and not even need all the fingers) the
number that would know or care what Charles or PragmArc are (much less
something called an "ARG"), and those few who'd heard of Booch would recall
it as just something that had been used in the early days.

Where are the journeyman programmers for whom Ada is just the language they
write code in going to find data structures?  If it doesn't show up in the
reference manual, it'll be borrowed from some home- or project-grown thing
that was done before, get ginned up yet again from scratch, or maybe get
copied out of a dog-eared Ada textbook.

Meanwhile, the C++ programmers have got the STL handed to them on a platter,
and the Java programmers have got their big JDK posters and Javadocs with
all those containers documented and ready to use.

But for the Ada programmer that just clocks in, codes, and goes home to
their family, nothing.

****************************************************************

From: Pascal Obry
Sent: Wednesday, February 11, 2004  2:59 PM

 > I fear the participants on this list are rather detached from the "average
 > Ada programmer" experience.

This is not about average something or not. Just that I'm using Ada in the
Information Technology domain. I don't really care(1) about size or
performances, this is not hard real-time nor embedded applications. In the IS
field we need a decent container libraries to speed-up developement. What I'm
saying is that if the container library is not complete I'll use something
else. And since people on the embedded or real-time field are certainly not
going to use the standard containers but most probably some simpler version
hand-coded for the application I'm a bit concerned about the current path...

Ada is *not only* an embbeded real-time programming language!

Pascal.

(1) I did not say that I want quick and dirty code :)
(2) BTW, I'm not sure to be an average Ada programmer :)

****************************************************************

From: Robert A. Duff
Sent: Wednesday, February 11, 2004  3:06 PM

> Ok, you are right there. I can easiely live with "indefinite later" -
> to name my pet feature - however some expressed an "indefinite never"
> stand and I can't live with that.

I don't remember anybody saying "indefinite never", but anyway, *my*
opinion is that a secondary standard containing a rich variety of stuff,
including all the bells and whistles that various folks have asked for,
including indefinite component types, would be a Good Thing.

But somebody has to take charge and push such a secondary standard
through.  I'm not volunteering.  ;-)

****************************************************************

From: Jeffrey Carter
Sent: Wednesday, February 11, 2004  6:02 PM

The only problem is that there doesn't seem to be any mechanism for such
a secondary standard. Indeed, the intial call for proposals for the
standard container library indicated that it was for a secondary
standard, but it is intended to become part of the ARM now.

****************************************************************

From: Randy Brukardt
Sent: Wednesday, February 11, 2004  10:42 PM

The intent is to use a new ISO procedure called an "International Workshop
Agreement". These get published essentially immediately (no lengthy approvals),
and then can later be turned into real standards if that proves to be a good
idea.

But, as Bob mentioned, there have to be people to drive that "Workshop"
(which doesn't need to be an actual workshop per-se).

****************************************************************

From: Jean-Pierre Rosen
Sent: Thursday, February 12, 2004  2:57 AM

There is such a mechanism: it is called an International Workshop Agreement. It
is a relatively new ISO procedure, giving official status (though not formally
Standard) to a specification for which there is consensus. Such an IWA may
become later a full-fledged standard.

Since it is new, nobody really knows how this works, and whether vendors would
feel bound to providing packages defined by IWA. But the mechanism is here.

****************************************************************

From: Jeffrey Carter
Sent: Thursday, February 12, 2004  12:49 PM

OK. How do we get such a "workshop" set up? I hope it's obvious that I'm
willing to participate.

****************************************************************

From: Jean-Pierre Rosen
Sent: Friday, February 13, 2004  4:44 AM

It is an ISO process, therefore you should get in touch with Jim Moore.
Of course, the first thing is to have a conveynor. If you step forward...

****************************************************************

From: Matthew Heaney
Sent: Wednesday, February 11, 2004  9:02 AM

As an example of the approach Bob is advocating here, I have included
two examples in the latest reference implementation.

<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040211.zip>

The two new examples are for a vector of indefinite elements and a set
of indefinite elements.

Both were implemented as a thin layer on top of the vector and set
containers provided by the library itself.

Neither one took very long to write (in fact I did it while watching an
episode of The Simpsons).

In the indefinite set example, I use the nested generic package
Generic_Keys, and its nested generic package Generic_Insertion.

In the indefinite vector example, I use the library-level Generic_Sort
generic algorithm.

In the test code, I instantiate each component with type String (an
indefinite type).

Note that if you want to instantiate the component with a class-wide
tagged type T'Class, then you'll probably have to declare these
class-wide operations somewhere:

    procedure Is_Equal (L, R : in T'Class) is
    begin
       return L = R;
    end;

    procedure Is_Less (L, R : in T'Class) is
    begin
       return L < R;
    end;

and then use these as the generic actuals for "<" and "=".

****************************************************************

From: Matthew Heaney
Sent: Wednesday, February 11, 2004  8:59 AM

> But Ada hasn't got a garbage collector so there is the deallocation problem.
> Especialy when the container copied or passed around.

The latest version of the reference implementation has examples of a
vector of indefinite elements and a set of indefinite elements.

Internally both instantiate the underlying container with a simple
controlled type that manages an access object that designates element
type of the higher-level container.

See the Insert_N and Replace_Element operations in the indefinite vector
package for a brief discussion of the various tradeoffs involved.  See
also Generic_Sort2.

See also the indefinite sets package for an example of how to use the
Generic_Keys nested generic package.

****************************************************************

From: Matthew Heaney
Sent: Wednesday, February 11, 2004  9:24 AM

> -- Truc is a proof-of-concept implementation of AI-302/3
> -- for indefinite elements, i.e. indefinite generic formal
> -- element types (reorder the 4 adjectives at will).

The latest version of the reference implementation has two new examples:
one for a vector of indefinite elements and another for a set of
indefinite elements.

There is no "automatic" selection of a package.  The programmer chooses
the correct package himself, at the time of instantiation.

If he needs to store indefinite elements, then he instantiates the
package for indefinite elements.

If his element type is definite, then he has the choice of using either
the definite or indefinite packages.  The package for definite elements
will be more efficient, of course.

****************************************************************

From: Marius Amado Alves
Sent: Wednesday, February 11, 2004  9:55 AM

> There is no "automatic" selection of a package.  The programmer chooses
> the correct package himself, at the time of instantiation.

I know. I saw your code. It's fine. So one last try: how about configuring
these indefinite elements versions as the one-page specialized needs annex
below? Matt's manual choice approach has the virtue of fitting right in.

ANNEX <IE>

Containers of Indefinite Elements

This Annex provides support for containers of indefinite elements.

[Implementation Requirements]

An implementation conforming to this Annex shall have the package
Ada.Indefinite_Elements and descendants defined in this Annex.

[Static Semantics]

The specifications of the descendants of Ada.Indefinite_Elements are a copy of
the specifications of the descendants of Ada.Containers specified in A.17,
with the unique difference that, for each generic descendant of
Ada.Containers that has a definite element formal type, the corresponding
descendant of Ada.Indefinite_Elements has an indefinite formal type in its
place.

[Dynamic Semantics]

The behaviour associated with each container of Ada.Indefinite_Elements is
exactly like that defined in A.17 for the corresponding container of
Ada.Containers.

[Examples]

Specification of Ada.Indefinite_Elements.Vectors:

  generic
    type Index_Type is (<>);
    type Element_Type (<>) is private;
    with function "=" (L, R : Element_Type) return Boolean is <>;
  package Ada.Indefinite_Elements.Vectors
    -- remainder of this package exactly like that of
    -- Ada.Containers.Vectors

****************************************************************

From: Robert A. Duff
Sent: Wednesday, February 11, 2004  11:02 AM

> I just wrote the thing excerpted below.
> The whole is available at
>   http://www.liacc.up.pt/~maa/containers
> Thanks.

It seems inefficient to store *two* vectors for each vector,
and to select between them at run time, when 'Definite is generally
known at compile time.  Why not let the programmer choose to instantiate
one or the other package?

Also, this code uses 'Unrestricted_Access, which is not Ada.

****************************************************************

From: Marius Amado Alves
Sent: Wednesday, February 11, 2004  9:22 AM

> The latest version of the reference implementation has examples of
> indefinite vectors and indefinite sets, both of which can be used to
> instantiate elements of type T'Class.

Good news!

BTW, Truc (www.liacc.up.pt/~maa/containers/truc.ada) has been updated also,
with:
- a test for classwide element types too (passed:-)
- cosmetics

****************************************************************

From: Marius Amando Alves
Sent: Wednesday, February 11, 2004  12:51 PM

> It seems inefficient to store *two* vectors for each vector,
> and to select between them at run time, when 'Definite is generally
> known at compile time.

As I say in the Truc spec, going outside the language would make it optimized.
I can think of a number of ways to do so, and eliminate those ineficiencies.

Aside.
The two vectors problem could perhaps be eliminated inside the language using
tagged types (it would still be dynamic dispatching though, i.e. a runtime
choice). I tried that but Ada got in my way and so I solved the problem
quickly and dirtly.

Anyway it is not a big inneficiency in practice because only one vector is
used, and the other never used not even initialized vector has neglectable
space and zero time impact.
End of aside.

>  Why not let the programmer choose to instantiate
> one or the other package?

Staying within Ada, yes, that is better, and supports my suggestion to put the
indefinite variants in a separate package branch defined in a specialized
needs annex. And using the already existing reference implementations by Matt
(released today).

> Also, this code uses 'Unrestricted_Access, which is not Ada.

You've got me, I'll have to change the 100% Ada claim to 99% :-)

I used 'Unrestricted_Access instead of the Rosen trick because element types
must be non-limited. Maybe there's another way, but I couldn't think of it.

Aside.
I used AI302.vectors of stream elements to avoid doing memory management, in
one more experiment in pointerless programming. And for other reasons. For
example programming for persistency: I can easily get a persistent container
just by changing the stream operations.

I needed write access to an in container because of this stream approach. Now
I'm curious if Matt's implementation has this and how he did it.
End of aside.

****************************************************************

From: Robert A. Duff
Sent: Wednesday, February 11, 2004  3:16 PM

Marius Amado Alves wrote:

> I know. I saw your code. It's fine. So one last try: how about configuring
> these indefinite elements versions as the one-page specialized needs annex
> below? Matt's manual choice approach has the virtue of fitting right in.

I like this idea, but I don't think it should be in a Specialized Needs
Annex (i.e. optional for implementers to support it).  The problem is
not that it's hard to support, but that it adds extra verbiage to the
RM.  You've shown, I think, that the extra verbiage could be pretty
small.

We compiler writers can probably even get Matt to code up the
implementation for us.  ;-)

This idea is much better than a magic package that supports both
definite and indefinite efficiently.

****************************************************************

From: Robert A. Duff
Sent: Wednesday, February 11, 2004  3:35 PM

True.

However, I think going outside the language is a bad idea.
I say: An efficient implementation should be possible in pure Ada.
As an implementer, I have no intention of adding compiler magic for this
stuff -- I want to be able to just write pure Ada code (or, better yet,
take advantage of Matt's work).

Even Address_To_Access_Conversions makes me nervous -- yeah, it's Ada,
but it's rather ill-specified.

> I can think of a number of ways to do so, and eliminate those ineficiencies.
>
> Aside.
> The two vectors problem could perhaps be eliminated inside the language using
> tagged types (it would still be dynamic dispatching though, i.e. a runtime
> choice). I tried that but Ada got in my way and so I solved the problem
> quickly and dirtly.
>
> Anyway it is not a big inneficiency in practice because only one vector is
> used, and the other never used not even initialized vector has neglectable
> space and zero time impact.
> End of aside.

I don't want users of definite types to pay *any* penalty caused by
supporting indefinite types.

> >  Why not let the programmer choose to instantiate
> > one or the other package?
>
> Staying within Ada, yes, that is better, and supports my suggestion to
> put the indefinite variants in a separate package branch defined in a
> specialized needs annex. And using the already existing reference
> implementations by Matt (released today).

As I said in my previous message, that suggestion seems reasonable,
except for the cialized needs annex" part.  For portability, we
don't need more optionally-supported features of Ada.

On the other hand, maybe support for indefinite is just too much
(for the Ada RM -- of course a secondary standard should support
all bells and whistles).

> > Also, this code uses 'Unrestricted_Access, which is not Ada.
>
> You've got me, I'll have to change the 100% Ada claim to 99% :-)

OK, but 99% isn't good enough.  I want these packages to implementable
in 100% pure Ada.  If that's not possible (as in the defaulted-discrims
case somebody mentioned) we need to change the language to *make* it
possible.

> I used 'Unrestricted_Access instead of the Rosen trick because element
> types must be non-limited. Maybe there's another way, but I couldn't
> think of it.

Well, I can think of ways involving "for X'Address use.." or
Address_To_Access_Conversions, and I might be willing to live with that,
but I don't like it.

I didn't read your code carefully enough to understand whether
'Unrestricted_Access was really needed.  Why not declare the thing
aliased, and use 'Unchecked_Access?

> Aside.
> I used AI302.vectors of stream elements to avoid doing memory management, in
> one more experiment in pointerless programming. And for other reasons. For
> example programming for persistency: I can easily get a persistent container
> just by changing the stream operations.
>
> I needed write access to an in container because of this stream
> approach. Now I'm curious if Matt's implementation has this and how he
> did it.  End of aside.

Is it not possible to allocate the indefinite thing in the heap,
and still know when it needs to be freed?  I don't like memory leaks...

****************************************************************

From: Randy Brukardt
Sent: Wednesday, February 11, 2004  4:24 PM

> I like this idea, but I don't think it should be in a Specialized Needs
> Annex (i.e. optional for implementers to support it).  The problem is
> not that it's hard to support, but that it adds extra verbiage to the
> RM.  You've shown, I think, that the extra verbiage could be pretty
> small.

I actually was going to suggest the same thing, given that the wording
needed is roughly the same supporting a "wide_string" version of something
given a "string" version.

I'll put it as an Open Issue in the "bug fix" update of the AI. (I don't
want to make major changes to the AI, because I don't want to present a
moving target to the ARG members who are supposed to be studying it for the
upcoming meeting...)

****************************************************************

From: Matthew Heaney
Sent: Wednesday, February 11, 2004  4:57 PM

I have several places in the reference implementation that I've notated
with "NOTE", places in the AI where the semantics aren't exactly
specified, where there's disagreement, where there can be improvement,
etc.  Should I send you a list or something?  When would you like me to
do that?

****************************************************************

From: Randy Brukardt
Sent: Wednesday, February 11, 2004  5:21 PM

Sure, do that. Any time is fine, but no later than the start of next week.

****************************************************************

From: Jeffrey Carter
Sent: Wednesday, February 11, 2004  5:56 PM

>>The specifications of the descendants of Ada.Indefinite_Elements are a copy of
>>the specifications of the descendants of Ada.Containers specified in A.17,
>>with the unique difference that, for each generic descendant of
>>Ada.Containers that has a definite element formal type, the corresponding
>>descendant of Ada.Indefinite_Elements has an indefinite formal type in its
>>place.

This doesn't seem quite right. The containers all have an Element_Type
formal, so it can specify that type. Maps have a Key_Type that should
also be indefinite, so it should specify it as well.

However, this seems like a good way to add support for indefinite
elements to the proposal. If only adding additional containers could be
this easy!

****************************************************************

From: Marius Amado Alves
Sent: Wednesday, February 11, 2004  5:46 PM

>I'll put it [Annex <IE>] as an Open Issue in the "bug fix" update of the AI.

Great!

Annex <IE> is all it takes to make the proposal "complete". It is
already fairly complete with respect to structural varieties (vector,
set, map). What it is really missing is element type varieties
(definite, indefinite). The group (definite, indefinite) has *exactly*
the same properties as (vector, set, map). Primitive (in the good sense
of course), concise, complete, useful. (Definite, indefinite) as opposed
to (definite, indefinite, tagged, limited, abstract...), like (vectors,
set, map) vs. (vector, set, map, queue, list...), these two oppositions
are in perfect alignment. The extra things in each latter group can be
realised with the ones in the former. The proposal is complete only if
it is complete along at least these two axes (structural variety,
element type). The other axes--size, persistence--are of lesser impact.
It does not offend me at all to have them set to a fixed point in the
standard--unbounded, core memory--, and extend them in secondary
standards--(fixed, bounded, unbounded...), (core, cache, file...) A nice
simetry. Container space has 4 axis (structure, element type, size,
persistence). Aternative 3 with Annex <IE> ranges over 2, and fixes a
point in the other 2. The ranges and points defining the most primitive
region. The standard region. I promise this is my last motivational
rambling for indefinite elements. I needed to have a view of the whole,
evidently I used my "system of coordinates", and I thought I might share
it with you.

Talking of Open Issues: the range vs. discrete index issue. I'd say
range. It solves the problem of Assert failing on enumerations. And the
use of an enumeration for the index of a *variable* length vector does
not make much sense. Ditto for modular types.

****************************************************************

From: Robert A. Duff
Sent: Wednesday, February 11, 2004  3:40 PM

One thing that disappoints me about the current containers proposal is
that there's no way to control memory allocation.  The C++ STL allows
the client to define which storage pool should be used.  Would it be
possible for us to do the same, without burdening users who just want to
use "the regular heap"?

****************************************************************

From: Randy Brukardt
Sent: Wednesday, February 11, 2004  4:17 AM

I don't think so. There were proposals offered for naming the standard
storage pool(s) and allowing defaults for generic formal parameters, but
both of those died an early death. (See AI-299 and AI-300.) Those were aimed
at solving this problem. Since we're not reintroducing existing, killed
proposals (and certainly the need for them in containers libraries was well
explained when first considered - there's no "new information" here), it
would have to be done without them.

That means that about the only way to do it would be with an access type
(with "null" meaning use the default pool). That seems very ugly to me,
especially as you would have to make the pool that you want to pass in
"aliased". And there doesn't seem to be a good place for that access type to
live.

In any case, that strikes me as creeping featurism.

****************************************************************

From: Matthew Heaney
Sent: Wednesday, February 11, 2004  4:34 PM

You could do something like this:

generic
    type Element_Type is private;
    Pool : in out Root_Storage_Pool'Class;
    with function "=" (L, R : ET) return Bool is <>;
package Ada.Container.Vectors is ...;   //for ex.

There are several problems:

(1) The language standard doesn't specify any storage pool objects.  I
suppose that the standard library could define a few default pool
objects, though.


(2) Even if you do have a pool then you run into problems with static
matching rules, since the generic formal pool type is T'Class, which
doesn't match a specific type NT in T'Class.  So you have to resort to
hacks like:

package My_Pools is

    My_Pool : My_Pool_Type;  --derives from RSP

    My_Pool_View : Root_Storage_Pool'Class
       renames Root_Storage_Pool (My_Pool);

end;

and then use My_Pool_View as the generic actual pool object.

(3) It's in conflict with our design principle that components be easy
to instantiate and use.  I would love to have a generic formal pool
object default a la "is <>" or "is <name>", but the language doesn't let
you specify defaults for generic formal objects.

(4) You might be able to get around (2) by declaring a generic formal
derived type:

generic
    type ET is private;
    type Pool_Type is new Root_Storage_Pool with private;
    Pool : in out Pool_Type;
package Ada.Containers.Vectors is ...;

but then this is in conflict with (3), because now there's another
formal type (which cannot be defaulted).

In C++ generic formal pool objects ("allocators") are allowed to have a
default, by constructing an allocator on-the-fly.  But then there's some
rule about the STL that requires allocator objects be shared (or
something like that)???  And then it complicates things for implementors
because you have to use the "empty virtual base class" trick to avoid
allocating padding for otherwise empty classes.

Realize that adding custom allocator support to the STL complicated the
semantics somewhat (do objects have the same or different allocators? --
affects assignment rules, etc).

An early version of Charles allowed you to pass in a storage pool, but I
eventually gave it up because it was too many headaches for casual users
who didn't care about supplying their own pool.

If you've studied my reference implementation then you might have
noticed that the substrate package (e.g. charles.red_black_trees) used
to implement the higher-level container is written so that all the
allocation and deallocation is done by the higher-level package.   The
substrate package is completely agnostic about how storage allocation
gets done.  This allows the user of the instantiation of the red-black
tree (say) to use a pool if he wants, or indeed even statically allocate
the nodes.  In fact the container elements can even be limited.  The
substrate package doesn't care.  All the ugliness is hidden from the
container user by the wrapper container package.

So I reached the conclusion that if someone (like, um, Bob Duff, who has
written lots of custom storage pools) needs a special sorted set that
uses some fancy storage pool, then it's not too hard to do that using
the substrate package directly and building his own wrapper class.

Perhaps there is a way to do this.  It may be that there's some slick
language trick that I haven't figured out that would allow the user to
pass in his own pool without too much pain at instantiation time.

There is also the multi-threading issue.  Clearly the user has the
responsibility to not allow concurrent access to the same container
object, but what about different threads each manipulating their own
container object, so we have multiple container objects (and hence
multiple threads) sharing a common pool object?  But I suppose you could
use the same synchronization mechanism you use for alligator new.

Maybe you could make some other (non-limited) abstraction, and pass that
in as the default, e.g.

    generic
       type ET is private;
       Pool : Pool_Handle := Default_Pool;  --from somewhere
    package Generic_Containers is ...;

but the language doesn't give you anything like the placement new
construct in C++, which allows you to construct an object in-place, at a
location you specify.

There is a sort of hack you can do by declaring a pool object
on-the-fly, that binds to an object (in some raw form e.g. storage
elements) to be constructed using an access discriminant.  Then you make
a dummy call to new, and internally the pool object specifies the
address of the object (to which the pool object is bound) as the address
return value.  The run-time system will then call Initialize on that
object.  Placement new in Ada95!  But that's kind of a trick and I don't
really know if it will work.

****************************************************************

From: Tucker Taft
Sent: Wednesday, February 11, 2004  9:31 AM

I believe the intent is that all of these containers
use controlled types to avoid storage leakage, analogous
to what unbounded strings do.  (In fact, I could imagine
that a vector and an unbounded string would have a lot
in common under the covers.)

So I'm not sure how a user-defined pool would interact
with that (and I fear based on our own experience that
putting finalizable things in user-defined pools can be
tricky).

Note that this will give more incentive for implementors
to "sharpen up" their implementation of controlled types.
I think that is a good thing, so we are spending energy
improving existing features of the language, rather than
dissipating energy on lots of different ways of skinning
the same cat.

****************************************************************

From: Tucker Taft
Sent: Wednesday, February 11, 2004  5:13 PM

Matthew Heaney wrote:
> ...
> (3) It's in conflict with our design principle that components be easy
> to instantiate and use.  I would love to have a generic formal pool
> object default a la "is <>" or "is <name>", but the language doesn't let
> you specify defaults for generic formal objects.

I generally agree with your reasoning, but this particular statement
is false, unless you say "formal IN OUT objects."  Formal IN objects
certainly can have defaults, and the way to pass in a storage pool
would be as Randy suggested, via an access value.  The default could
be implementation-defined, with the semantics that it implies the
standard storage pool, if allowed to default.

But I still believe my earlier response, that mixing user-defined
storage pools and controlled types is asking for complexity, and
doesn't seem to buy enough to justify itself.

One reason to have a user-defined storage pool is to do some kind
of garbage collection, or to do mark/release.  Both of those could
easily interfere with the implementation of controlled
types, unless the user was very careful, and had a pretty good idea about
how controlled types were implemented.

****************************************************************

From: Simon J. Wright
Sent: Thursday, February 12, 2004  3:12 AM


Marc A. Criley wrote:

> Where are the journeyman programmers for whom Ada is just the
> language they write code in going to find data structures?  If it
> doesn't show up in the reference manual, it'll be borrowed from some
> home- or project-grown thing that was done before, get ginned up yet
> again from scratch, or maybe get copied out of a dog-eared Ada
> textbook.

For a project of any size I would not expect journeyman programmers
to be making this sort of choice; it should be a matter of policy set
by the software architect(s), along with "how we use tasks", "how we
deal with exceptions" etc.

So the question is, where do the architects find stuff? and clearly
the ARM is a very good start (though I have to admit there are parts
of Annex A that I'm not at all familiar with and should be!
Strings.Maps, for example).

I started maintaining the BCs because I needed containers for a demo
project. Although it has been fun, I would never have done so if the
proposed library had been available, and I strongly support it.

****************************************************************

From: Marc A. Criley
Sent: Thursday, February 12, 2004  7:46 AM

Speaking as a software architect for both Ada and C++ projects, your
characterization of this aspect of the architect's job is quite correct.
One of my sub-tasks was ensuring that only the authorized container classes
were being used, even to the point of once having to threaten to reject a
developer's code if he didn't start using STL instead of coding up his own
comparable classes.

However, I've had plenty of experience with other architects and leads who
aren't on this mailing list, don't visit comp.lang.ada (or comp.lang.c++),
aren't on the Team-Ada mailing list, don't subscribe to any technial
magazines or journals, aren't in ACM, don't go home and code at night or on
weekends, and own only one book covering each programming language they have
to deal with, whether it be Ada, C++, Java, Perl, etc.

When they need container classes, they check the project's code base or they
go to their books, which for Ada does usually include the reference manual.
That's where container  availability needs to be publicized, because if they
do find something on the Web (like Booch or Charles or PragmArc), they need
to first expend the effort to convince themselves that that "home-grown"
collection is something that would be useful, and then overcome most
management's resistance to using "free", unsupported software.  Making
Ada.Containers part of the standard language distribution gives it a cachet
of legitimacy that means that architects/leads don't have to fight that
fight.  And they end up with a container collection that will handle the
needs of most non performance critical projects.

****************************************************************

From: Stephen Leake
Sent: Thursday, February 12, 2004  11:48 AM

Just to state my position for the record:

I like AI-302-3.

I think the rationale for the design needs to be more clearly stated
(particularly why the key and element types are definite).

Examples of how to build packages supporting indefinite types would be
good; an actual standard for that (layered on top of the current one)
would be better, but I can wait for that.

I like the term "cursor" instead of "iterator"; "iterator" is clearly
overloaded, while "cursor" matches the usage in SQL.

Since these packages are intended to be low-level building blocks, I'd
rather see them called "unbounded_array", "hashed_map", and
"sorted_tree". But that's a small issue.

****************************************************************

From: Matthew Heaney
Sent: Thursday, February 12, 2004  3:41 PM

See the latest reference implementation (Thu, 12 Feb 2004) for examples
of using the canonical containers to implement indefinite vectors and
indefinite sets.

<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040212.zip>

I'll have at least one example of a map of indefinite elements tomorrow.


> I like the term "cursor" instead of "iterator"; "iterator" is clearly
> overloaded, while "cursor" matches the usage in SQL.

The Iterator design pattern described in the Gamma book says that
"Cursor" is an alias for the term "Iterator", so you seem to be in good
company.


> Since these packages are intended to be low-level building blocks, I'd
> rather see them called "unbounded_array", "hashed_map", and
> "sorted_tree". But that's a small issue.

Low-level is a point of view.

It's a vector implemented as an unbounded array, not an unbounded array
per se.

It's a map, implemented using a hash table, but not a hash table per se.

It's a sorted set, implemented using a balanced (red-black) tree, but
not a tree per se.

Yes, they're building blocks.  Yes, they're low level.  But they're not
as low-level as unbounded arrays, hash tables, and red-black trees.

****************************************************************

From: Stephen Leake
Sent: Friday, February 13, 2004  8:19 AM

> > "sorted_tree". But that's a small issue.
>
> Low-level is a point of view.
>
> It's a vector implemented as an unbounded array, not an unbounded
> array per se.

Hm. Let's compare Ada.Containers.Vectors to SAL.Poly.Unbounded_Array.

Vectors has Insert in the middle, Sort, and Element_Access. SAL allows
indefinite and limited items. Otherwise they are the same.

Sort is a reasonable operation for any container; I would put it in a
child package, since many applications won't need it.

I guess that means SAL.Poly.Unbounded_Array is actually a "vector"?
What would a true low-level unbounded_array look like?

> It's a map, implemented using a hash table, but not a hash table per se.

I need to see your definition of "hash table"; this looks like one to me.

> It's a sorted set, implemented using a balanced (red-black) tree, but
> not a tree per se.

This one I'll grant you; it is more complex than just a tree.

> Yes, they're building blocks.  Yes, they're low level.  But they're
> not as low-level as unbounded arrays, hash tables, and red-black trees.

As long as the names are sufficiently clear, and it is clear how to
name new components that complement these, I'm happy. As I see it, all
of these names have loose enough definitions that this issue is _not_
a show stopper.

****************************************************************

From: Matthew Heaney
Sent: Thursday, February 12, 2004  9:33 AM

> Yup. That's precisely how Type'Size works in Ada; it has a fairly weak
> effect on Obj'Size, but in any case, if you set it, you have to return the
> same value (even if that value has nothing to do with how objects are
> actually stored).

The Size function is analogous to the capacity() member function in the
STL vector class.

The Resize procedure is analogous to the reserve() member function.

A vector container is implemented internally as a contiguous array, that
expands as items are inserted into the container.

The Size function returns the length of the internal array.  The Length
function returns the number of elements in the array that are "active,"
that have actually been inserted into the vector.

At all times a vector satisfies the invariant that

    Length (V) <= Size (V)

The procedure Resize tells the vector to expand to at least the size
specified in the call.

If the current size is equal to or greater than the value specified,
then Resize does nothing.

If the current size is less than the value specified, then the internal
array is expanded.  The standard does not specify the exact algorithm
for expansion, and only requires that the Size function return at least
the value specified.

There's nothing special an implementation needs to do to keep track of
the current value of the size, since it has that information already:
it's just the result of the 'Length attribute for the internal array.

>>Resize is an appropriate name for the operation as specified. I expect
>>an operation named Resize to cause resizing. If we're really talking
>>about giving the implementation a hint about an appropriate size, then
>>not only does the specification need to be changed, the name also needs
>>to be different (perhaps Size_Hint?).

The semantics for Resize are described above.

> I don't see a strong need to change the name, but I do agree with you that
> there shouldn't be a *requirement* to do some allocation.

There is a requirement for allocation only if the current size is less
than the size specified in the call to Resize.

****************************************************************

From: Matthew Heaney
Sent: Thursday, February 12, 2004  9:58 AM

> We compiler writers can probably even get Matt to code up the
> implementation for us.  ;-)

Ask, and you shall receive...

The latest version (12 Feb 2004) of the reference implementation has an
example of a sorted map, implemented using the sorted set and by
instantiating its nested generic package Generic_Keys.

There are also two examples of hashed sets, for both definite and
indefinite elements.  This standard doesn't have a hashed set but if it
did then this is what it would look like.

<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040212.zip>

****************************************************************

From: Stephen Leake
Sent: Thursday, February 12, 2004  11:55 AM

> Yup. That's precisely how Type'Size works in Ada; it has a fairly weak
> effect on Obj'Size, but in any case, if you set it, you have to return the
> same value (even if that value has nothing to do with how objects are
> actually stored).

I think that's a bad idea. It means my scenario (preserve the max size
of a container in a config file, and set it next time on startup)
won't work. If I set the max size from yesterday on startup today, but
then the container grows larger today, when I exit and query the size,
I won't get the larger correct size, but just the one I set at the
begining.

Setting the size should be a hint, but only for a starting point.
Querying the size should always return the current size.

****************************************************************

From: Jeffrey Carter
Sent: Thursday, February 12, 2004  12:47 PM

Randy Brukardt wrote:

> Yup. That's precisely how Type'Size works in Ada; it has a fairly
> weak effect on Obj'Size, but in any case, if you set it, you have to
> return the same value (even if that value has nothing to do with how
> objects are actually stored).

Not quite precisely. There are cases where a compiler is required to use
the specified 'Size.

> One allocation per key is a lot more than one allocation per *map*,
> which is what a stringspace implementation takes. (Well, it might
> have to expand if it gets full, but that should be rare. It could
> degrade to one allocation per key if the keys are very, very long,
> but some care in implementation should prevent degrading.)

OK. I misunderstood.

> I have a component like that (it's actually Tom Moran's), but in
> practice, I've *never* used it without using the index values it
> provides to manage some other data in a separate table (at least
> statistics and/or debugging). Even the 'known words' list in the spam
> filter uses the indexes (handles) for debugging. If that's the case,
> why bother having to use a separate component (causing another chance
> of error)?
>
> So I would guess that the "dummy type" would gain some real data in
> 95% of the applications. And that such uses are less than 10% of the
> uses of a map anyway. Since this is a minimal library, we're not
> trying to cover that remaining 0.5%.

It's not "another component"; it's the underlying implementation of the
hashed map component. My point is that we're requiring the
implementation of a hash table, which is a useful component, but not
requiring that it be provided to users. That's like requiring that a
compiler be able to convert strings into numbers, but not having 'Value
in the language. It doesn't require any additional work by implementors,
nor introduce an additional opportunity for errors, but it does increase
the utility of the library.

To me it's a no brainer, as is converting the map part of the "sorted
set" component (Generic_Keys) into its own component: it's no additional
work for implementors, and allows the user to obtain a sorted map with a
single instantiation, instead of 2.

Put another way, the library has 2 different approaches to defining a
map. In one, we have a map component and hide the underlying
implementation (the hash table). In the other, we have the "sorted set"
component, and then a map component implemented in terms of it. We
should at least be consistent, and I argue that our consistency take the
form of providing both the underlying implementation and the map
implemented in terms of it.

****************************************************************

From: Matthew Heaney
Sent: Thursday, February 12, 2004  3:21 PM

> It's not "another component"; it's the underlying implementation of the
> hashed map component. My point is that we're requiring the
> implementation of a hash table, which is a useful component, but not
> requiring that it be provided to users.

A hash table might be at the wrong level of abstraction (too low).  The
hashed map actually takes the level of abstraction up a notch.

In my original proposal, I allowed the user to query each bucket of the
underlying hash table array, but the subcommittee rejected that approach
as too low level, in favor of higher-level First and Succ active
iterator operations.

****************************************************************

From: Matthew Heaney
Sent: Thursday, February 12, 2004  3:43 PM

> Setting the size should be a hint, but only for a starting point.
> Querying the size should always return the current size.

Yes, querying the size should always return length of the internal array.

If the value specified in the call to Resize is larger than the current
length of the internal array, then the internal array is expanded to at
least the length specified.

****************************************************************

From: Simon J. Wright
Sent: Thursday, February 12, 2004 10:42 AM

> The Size function is analogous to the capacity() member function in the
> STL vector class.
>
> The Resize procedure is analogous to the reserve() member function.

...

Do we really need these operations? I presume that they support
optimisation by allocating extra space ahead of time -- do our users
really need that? (assuming of course that the vector will resize
itself if it finds it needs to).

****************************************************************

From: Matthew Heaney
Sent: Thursday, February 12, 2004  11:10 AM

Yes, it makes the optimization you describe -- the Resize preallocates
an internal array large enough to contain all future insertions.

The optimization is important especially for very large numbers of
elements.

But don't take my word for it.  Measure the performance of this procedure:

procedure Not_Optimized (V : in out Vector_Type) is
begin
    for I in 1 .. 1_000_000 loop
       Append (V, New_Item);
    end loop;
end;

and then compare it to this one:

procedure Optimized (V : in out Vector_Type) is
begin
    Resize (V, Size => 1_000_000);

    for I in 1 .. 1_000_000 loop
       Append (V, New_Item);
    end loop;
end;

If you really want to see a difference then use a complex element type,
perhaps one that is controlled and does lots of internal allocation.

I know it makes a difference because I've actually had the problem.  In
my streaming media server, when a file is requested I must load large
indexes comprising several hundred thousand elements that describe the
frames in the file (these are 2 hour movies).

When I first wrote the server there was a huge spike in the CPU monitor
whenever I loaded a file, and this tended to disrupt existing streaming
clients.  (This is a real-time streaming media server, and I have to
service several hundred clients simultaneously.)

I did some analysis and realized is was population of the index vector
that was the cause of my problem.  So I just figured out my total number
of indexes before inserting and then did a Resize.  And now all it well.

So performance matters, and therefore we should keep Size and Resize.
Of course, if your vector objects are small, or you don't have any
special performance needs, then you can just ignore Resize and the
vector will work fine.

****************************************************************

From: Robert A. Duff
Sent: Thursday, February 12, 2004  12:01 PM

I would expect the former to about lg(1_000_000) = 20 allocations,
and the latter to do 1 allocation, presuming the growth is exponential,
which it should be.  (E.g. double the size each time you run out
of space.)

> So performance matters, and therefore we should keep Size and Resize.

I agree.  I use a similar growable array abstraction quite heavily in my
current project, and there are cases where the code knows the size ahead
of time (or can guess), and I care enough about speed to do the Resize.

****************************************************************

From: Alexandre E. Kopilovitch
Sent: Thursday, February 12, 2004  3:29 PM

> Yes, but access types themselves are not tagged. What they point at is irrelevant.
> If you have a formal "type T is tagged private;" no access type will match
> that; it's the same for interfaces.

Still don't understand: if we can do something useful with, say, an array
(or Unbounded_Array) of interface objects then why can't do the same with
an array of accesses to interface objects - just dereferencing them before
calling a member of the interface?

****************************************************************

From: Randy Brukardt
Sent: Thursday, February 12, 2004  4:58 PM

Because you can't create a container (a map, say) of access types in this
model. Remember, an interface has no implementation, so at some point you
have to have a concrete implementation.

Let me try to give a very simple example:

(* Warning *) This is not a serious proposal! (* End Warning *)

   package Ada.Containers is
       type Element_Interface is interface;
       -- Any element operations here (I don't think there need to be any).

       type Cursor_Interface is interface;
       -- Any common cursor operations here.
   end Ada.Containers;

   package Ada.Containers.Interfaces is
       type Forward_Iterator_Container_Interface is interface;
       function Null_Cursor (Container : Forward_Iterator_Container_Interface) return
            Cursor_Interface'Class is abstract;
       function Front (Container : Forward_Iterator_Container_Interface) return
            Cursor_Interface'Class is abstract;
       procedure Increment (Container : Forward_Iterator_Container_Interface;
            Cursor : in out Cursor_Interface'Class) is abstract;
       function Element (Container : Forward_Iterator_Container_Interface;
            Cursor: Cursor_Interface'Class) return Element_Interface'Class
                 is abstract;
       ...
       -- (It might make more sense to put the "iterator" operations on the Cursor_Interface.
       -- But then you'd need a separate interface just for element access through a cursor.)
   end Ada.Containers.Interfaces;

   with Ada.Containers.Interfaces;
   generic
        type Key_Type is private;
        type Element_Type is new Element_Interface;
        ... -- As before
   package Ada.Containers.Maps is
        type Map_Type is new
           Ada.Containers.Interfaces.Forward_Iterator_Container_Interface
               with private;
           -- Of course, other useful interfaces also would be included here.
           -- Probably including a "map" one.
        type Cursor_Type is new Cursor_Interface with private;

        ... -- As before. (With appropriate Null_Cursor and Increment routines).
   end Ada.Containers.Maps;

Now, to use this, the element type has to 'have' the Element_Interface
interface:
    type My_Element_Type is new Ada.Containers.Element_Interface ... with ...;
You can't instantiate the container with a scalar type or an access type or an
array type or any record that doesn't have the Element_Interface interface.

Now, the point of all of this is that you now can write an iteration routine
that will work for any container having the
Forward_Iterator_Container_Interface. For instance, to create a passive
iterator, you could do (this of course isn't useful, but the ability to write
such things is):

    generic
        with procedure Process (Element : in Element_Interface'Class);
    procedure Iterator (Container : Forward_Iterator_Container_Interface'Class);
    procedure Iterator (Container : Forward_Iterator_Container_Interface'Class) is
        Current : Cursor_Interface'Class := Front (Container);
    begin
        while Current /= Null_Cursor (Container) loop
             Process (Element (Container, Current));
             Increment (Container, Current);
        end loop;
    end Iterator;

Moreover, the instantiations are pretty much the same as the current
proposal. But the element types are limited to tagged types.

****************************************************************

From: Ehud Lamm
Sent: Thursday, February 12, 2004  2:43 AM

But signature packages would work ok, wouldn't they?

****************************************************************

From: Randy Brukardt
Sent: Thursday, February 12, 2004  5:06 PM

Signature packages violate the meta-rule about ease of instantiation: as few
instantiations as possible to get a usable container. (That's one
instantiation, of course.) As far as I can tell, to use them like
interfaces, they'd have to be a parameter to the generic container package.

But perhaps you had something else in mind.

In any case, I don't like signature packages. They add layers of overhead on
a generic sharing implementation (every generic package has a cost, the more
you use, the more that cost is), turning the performance of pretty much
anything into that of bad Java code. (That's not a problem if the signature
doesn't contain anything "expensive", but trying to define that - and work
around it - is a fool's game.)

****************************************************************

From: Randy Brukardt
Sent: Friday, February 13, 2004  12:13 AM

Jeffrey Carter:

> > Yup. That's precisely how Type'Size works in Ada; it has a fairly
> > weak effect on Obj'Size, but in any case, if you set it, you have to
> > return the same value (even if that value has nothing to do with how
> > objects are actually stored).
>
> Not quite precisely. There are cases where a compiler is required to use
> the specified 'Size.

Not for a (sub)type. 13.3(48) says that an object's size is *at least* as
large as the specified size. Anything else said is "advice".

...
> It's not "another component"; it's the underlying implementation of the
> hashed map component. My point is that we're requiring the
> implementation of a hash table, which is a useful component, but not
> requiring that it be provided to users. That's like requiring that a
> compiler be able to convert strings into numbers, but not having 'Value
> in the language. It doesn't require any additional work by implementors,
> nor introduce an additional opportunity for errors, but it does increase
> the utility of the library.

Not true at all. Building a separate hash table component and then building
a map on top of that would be a horrible implementation performance-wise.
Lots of extra call and generic overhead. So, in practice, they'd have
completely separate implementations -- thus, you'd be doubling the work.

Moreover, the component you're describing (a hash table without elements)
wouldn't have any place to *put* elements. So I don't see how you could even
use it to implement the map. (The hash table component you're suggesting
would return a Cursor object to represent each key, but that item isn't an
index that you could use in a sequence. So how would you associate a key
from the hash table with an element? A linear list would work, but would
essentially make the hash table useless.)

What I suspect would happen in practice is that the relatively useless hash
table component would be implemented in terms of a map with a null record
element type. What's the point in that - the user can do that themselves if
they need it?

> To me it's a no brainer, as is converting the map part of the "sorted
> set" component (Generic_Keys) into its own component: it's no additional
> work for implementors, and allows the user to obtain a sorted map with a
> single instantiation, instead of 2.

Matt will tell you that the difference between a Sorted_Set using
Generic_Keys and a Map (any kind) is that the key doesn't have a separate
existence in the Sorted_Set; it's part of the element. Whereas in a Map, it
is separate from the element. There's obviously a significant space
advantage to avoiding duplicate keys.

I originally deleted the Generic_Keys component as redundant (because I too
thought it was a Map), then put it back after a discussion on C.L.A. showed
how important it is.

Matt will also tell you that he'd prefer both a Sorted_Map and a Hashed_Set,
and Tucker would tell you that he'd prefer an Unsorted_Set. And dozens of
people have asked that the List be put back. But that would quickly ballon
the proposal to double its size, and in any case smacks of "feeping
creaturism". :-)

****************************************************************

From: Matthew Heaney
Sent: Friday, February 13, 2004  8:32 AM

> Not true at all. Building a separate hash table component and then building
> a map on top of that would be a horrible implementation performance-wise.

Gulp!  I guess Randy hasn't looked at the reference implementation yet...

> Lots of extra call and generic overhead. So, in practice, they'd have
> completely separate implementations -- thus, you'd be doubling the work.

Jeff may have assumed (perhaps by looking at the reference
implementation) that implementors would implement the (hashed) map as a
layer on top of a separate generic hash table component.  But as Randy
notes, implementors won't necessarily implement the map container that
way, and so Jeff is basically advocating that another component
(specifically, a low-level hash table data structure) be added to the
standard library.

> Matt will tell you that the difference between a Sorted_Set using
> Generic_Keys and a Map (any kind) is that the key doesn't have a separate
> existence in the Sorted_Set; it's part of the element. Whereas in a Map, it
> is separate from the element. There's obviously a significant space
> advantage to avoiding duplicate keys.

What Randy told you Matt would tell you is correct...

> I originally deleted the Generic_Keys component as redundant (because I too
> thought it was a Map), then put it back after a discussion on C.L.A. showed
> how important it is.

Yes.  It allows the instantiator to take advantage of properties of the
generic actual set element type that the generic set itself isn't privy
to.  See for example the Indefinite_Sets example in the reference
implementation.

> Matt will also tell you that he'd prefer both a Sorted_Map and a Hashed_Set,
> and Tucker would tell you that he'd prefer an Unsorted_Set. And dozens of
> people have asked that the List be put back. But that would quickly ballon
> the proposal to double its size, and in any case smacks of "feeping
> creaturism". :-)

What Randy told you Matt would tell is once again correct...

****************************************************************

From: Marius Amado Alves
Sent: Friday, February 13, 2004  12:54 PM

I've updated Truc: the "100% Ada" claim is now true. The URL is the same
(www.liacc.up.pt/~maa/containers/truc.ada)

Truc features an implementation of indefinite elements using streams,
alternate to Matt's approach using controlled deallocation. This could be of
interest to implementors. But remember Truc was a proof-of-concept and is
missing many standard functions.

The other principal feature of Truc is now merely academic, that it choses
automatically the most appropriate implementation to the actual element type
w.r.t. definiteness. It is now settled that the choice will be manual (done
by the user).

****************************************************************

From: Dan Eilers
Sent: Friday, February 13, 2004  6:44 PM

I think its a little to soon to say that manual choice is settled.
Certainly it is agreed that there should not be any overhead from
support of indefinite types forced onto users of definite types.

But a user really probably prefers not to have to worry about which
flavor of each container to instantiate, just like users of
generic_elementary_functions currently don't have to explicitly
select between single and double precision versions.

You earlier proposed a language extension as an aside:
> Aside. Of course there is still no standard means to do this, but it
> would be a nice extension. Conditional compilation of generic bodies
> based on instantiation properties. Variant units :-)
>   generic
>     type T is private;
>     ...
>   package G is
>     when T'Definite =>
>       ...;
>     when others =>
>       ...;
>   end;
> (On the subject of conditional compilation, see also the recent Ada
> Preprocessor thread on CLA.)

This looks like too large of a change for the benefit, but there
may be a simpler change that would work.  For example, by extending
the syntax for renames to allow a conditional expression, as in:

    generic package p1 is
    end p1;

    generic package p2 is
    end p2;

    with p1, p2;
    generic package p3 renames (if condition then p1 else p2);

****************************************************************

From: Alexandre E. Kopilovitch
Sent: Friday, February 13, 2004  9:38 PM

> Because you can't create a container (a map, say) of access types in this
> model. Remember, an interface has no implementation, so at some point you
> have to have a concrete implementation.

I remember that, but I still can't get how it may be possible that

1) we can create a container of interfaces and
2) we can create a container of accesses and
3) we have accesses to interfaces

but at the same time we cannot create a container of accesses to interfaces.

I don't understand how the delayed implementation of interfaces may create
this situation. Let me follow your example:

> Let me try to give a very simple example:
>
> (* Warning *) This is not a serious proposal! (* End Warning *)
>
>   package Ada.Containers is
>       type Element_Interface is interface;

Let's change the above line to:

        type Item_Interface is interface;
        type Element_Access is access all Item_Interface;

>       -- Any element operations here (I don't think there need to be any).
>
>       type Cursor_Interface is interface;
>       -- Any common cursor operations here.
>   end Ada.Containers;
>
>   package Ada.Containers.Interfaces is
>       type Forward_Iterator_Container_Interface is interface;
>       function Null_Cursor (Container : Forward_Iterator_Container_Interface) return
>            Cursor_Interface'Class is abstract;
>       function Front (Container : Forward_Iterator_Container_Interface) return
>            Cursor_Interface'Class is abstract;
>       procedure Increment (Container : Forward_Iterator_Container_Interface;
>            Cursor : in out Cursor_Interface'Class) is abstract;
>       function Element (Container : Forward_Iterator_Container_Interface;
>            Cursor: Cursor_Interface'Class) return Element_Interface'Class is abstract;

and the above function to:

        function Element (Container : Forward_Iterator_Container_Interface;
            Cursor: Cursor_Interface'Class) return Element_Access is abstract;
        function Item (Container : Forward_Iterator_Container_Interface;
            Cursor: Cursor_Interface'Class) return Item_Interface'Class is abstract;

>       ...
>       -- (It might make more sense to put the "iterator" operations on the Cursor_Interface.
>       -- But then you'd need a separate interface just for element access through a cursor.)
>   end Ada.Containers.Interfaces;
>
>   with Ada.Containers.Interfaces;
>   generic
>        type Key_Type is private;
>        type Element_Type is new Element_Interface;

change above line to

         type Item_Type is new Item_Interface;
         type Element_Type is access all Item_Type;

>        ... -- As before
>   package Ada.Containers.Maps is
>        type Map_Type is new
>           Ada.Containers.Interfaces.Forward_Iterator_Container_Interface with private;
>           -- Of course, other useful interfaces also would be included here. Probably
>           -- including a "map" one.
>        type Cursor_Type is new Cursor_Interface with private;
>
>        ... -- As before. (With appropriate Null_Cursor and Increment routines).
>   end Ada.Containers.Maps;
>
> Now, to use this, the element type has to 'have' the Element_Interface interface:
>     type My_Element_Type is new Ada.Containers.Element_Interface ... with
> ...;

correspondily:

  Now, to use this, the element type must be access to a type that has to 'have' the
  Item_Interface interface:
     type My_Item_Type is new Ada.Containers.Item_Interface ... with ...;
     type My_Element_Type is access all My_Item_Type;

> You can't instantiate the container with a scalar type or an access type or
> an array type or
> any record that doesn't have the Element_Interface interface.

But now, with the above changes we can instantiate the containter with access
to a tagged type that has Item_Interface interface.

Where I am wrong here - in which point/step?

****************************************************************

From: Randy Brukardt
Sent: Friday, February 13, 2004  9:47 PM

...
> correspondily:
>
>   Now, to use this, the element type must be access to a type that has to 'have' the
>   Item_Interface interface:
>      type My_Item_Type is new Ada.Containers.Item_Interface ... with ...;
>      type My_Element_Type is access all My_Item_Type;
>
> > You can't instantiate the container with a scalar type or an access type or
> > an array type or any record that doesn't have the Element_Interface interface.
>
> But now, with the above changes we can instantiate the containter with access
> to a tagged type that has Item_Interface interface.
>
> Where I am wrong here - in which point/step?

This works, of course, but now you can only instantiate with
access-to-interfaces. That's even more limiting than just interfaces -
because you have to do all of the memory management yourself. If you've been
following along here, I'm sure you've noticed that that won't do.

You could of course support this as an alternative implementation with both
sets of stuff around. But then you've instantly doubled the size of the
library -- and you still can't have a container of floats or of arrays
(especially of unconstrained arrays). Wrappers are very space-inefficient in
the first case, and barely possible for unconstrained arrays (the code to
use them will be very ugly).

****************************************************************

From: Matthew Heaney
Sent: Friday, February 13, 2004  9:08 AM

The current version of the reference implementation has examples of
indefinite sets, maps, and vectors.

<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040213.zip>

However, I have discovered a potential anomaly in indefinite containers
that I wanted to make users aware of.

An indefinite container is implemented by storing a pointer to the
(indefinite) element, and doing the allocation and deallocation of the
element behind the scenes during insertion and deletion.

The issue comes up in the item-less forms of insertion.  In that case,
there is a null pointer for the element.  This has several consequences.

Consider the vector.  When we do an item-less insert, does that mean we
copy the internal pointer up to the next position, and leave a null
pointer at the insertion position?  Or do we leave the original element
there and make a copy of the element to slide up?

When we delete vector elements, do we move pointers down, and leave null
element pointers behind?  Or are we required to make a copy to slide down?

What should the passive iterator do when it hits a null element pointer?
Skip that position or just raise Constraint_Error?

Should we generalize Replace_Element, to allow a "null element" as the
replacement value?

Does Generic_Element return a null pointer if the element pointer is
null, or does it raise CE?

What should sort do with null elements?  Assume that a null element is
always less than a non-null element?

This affects streaming of elements too, because you have to stream out
an extra bit to indicate whether the element is null or not.

My tentative assumption is that we'll have to omit the item-less
insertion operations in the indefinite containers.  This mostly applies
only to vector and map, but I still have to analyze the behavior of
indefinite set Generic_Keys nested package.

The reference implementation doesn't do anything special for indefinite
vectors.  The indefinite map handles null elements.  I will fix both
this weekend, as I prepare an errata list for Randy.

If the developers who want indefinite containers have an opinion about
these matters than please speak up.

****************************************************************

From: Matthew Heaney
Sent: Friday, February 13, 2004  9:31 AM

****************************************************************

From: Matthew Heaney
Sent: Friday, February 13, 2004  9:31 AM

****************************************************************

From: Matthew Heaney
Sent: Friday, February 13, 2004  9:31 AM

****************************************************************

From: Matthew Heaney
Sent: Friday, February 13, 2004  9:31 AM

****************************************************************

From: Matthew Heaney
Sent: Friday, February 13, 2004  9:31 AM

****************************************************************

From: Matthew Heaney
Sent: Friday, February 13, 2004  9:31 AM

****************************************************************

From: Matthew Heaney
Sent: Friday, February 13, 2004  9:31 AM

****************************************************************

From: Matthew Heaney
Sent: Friday, February 13, 2004  9:31 AM

****************************************************************

From: Matthew Heaney
Sent: Friday, February 13, 2004  9:31 AM

****************************************************************

From: Matthew Heaney
Sent: Friday, February 13, 2004  9:31 AM

****************************************************************

From: Matthew Heaney
Sent: Friday, February 13, 2004  9:31 AM

****************************************************************

From: Matthew Heaney
Sent: Friday, February 13, 2004  9:31 AM

****************************************************************

From: Matthew Heaney
Sent: Friday, February 13, 2004  9:31 AM

****************************************************************

From: Matthew Heaney
Sent: Friday, February 13, 2004  9:31 AM

****************************************************************

From: Matthew Heaney
Sent: Friday, February 13, 2004  9:31 AM

****************************************************************

From: Matthew Heaney
Sent: Friday, February 13, 2004  9:31 AM

****************************************************************

From: Matthew Heaney
Sent: Friday, February 13, 2004  9:31 AM

****************************************************************

From: Matthew Heaney
Sent: Friday, February 13, 2004  9:31 AM

****************************************************************

From: Matthew Heaney
Sent: Friday, February 13, 2004  9:31 AM

****************************************************************

From: Matthew Heaney
Sent: Friday, February 13, 2004  9:31 AM

****************************************************************

From: Matthew Heaney
Sent: Friday, February 13, 2004  9:31 AM

****************************************************************

From: Matthew Heaney
Sent: Friday, February 13, 2004  9:31 AM

****************************************************************

From: Matthew Heaney
Sent: Friday, February 13, 2004  9:31 AM

****************************************************************

From: Matthew Heaney
Sent: Friday, February 13, 2004  9:31 AM

****************************************************************

From: Matthew Heaney
Sent: Friday, February 13, 2004  9:31 AM

****************************************************************

From: Matthew Heaney
Sent: Friday, February 13, 2004  9:31 AM

****************************************************************

From: Matthew Heaney
Sent: Friday, February 13, 2004  9:31 AM

****************************************************************

From: Matthew Heaney
Sent: Friday, February 13, 2004  9:31 AM

****************************************************************

From: Matthew Heaney
Sent: Friday, February 13, 2004  9:31 AM

****************************************************************

From: Matthew Heaney
Sent: Friday, February 13, 2004  9:31 AM

****************************************************************

From: Matthew Heaney
Sent: Friday, February 13, 2004  9:31 AM

****************************************************************

From: Matthew Heaney
Sent: Friday, February 13, 2004  9:31 AM

****************************************************************

From: Matthew Heaney
Sent: Friday, February 13, 2004  9:31 AM

****************************************************************

From: Matthew Heaney
Sent: Friday, February 13, 2004  9:31 AM

****************************************************************


Questions? Ask the ACAA Technical Agent