Version 1.19 of ais/ai-30302.txt
!standard A.17 04-02-13 AI95-00302-04/00
!class amendment 04-02-13
!status No Action (11-0-0) 05-04-16
!status work item 04-02-13
!status received 04-02-13
!priority Medium
!difficulty Hard
!subject Container library (mail container)
!summary
This is a dummy AI created solely to hold the volumious mail on this topic.
See AI-302-03 for the actual proposal.
!problem
!proposal
!wording
!example
--!corrigendum A.17
!ACATS Test
!appendix
[Editor's note: For mail earlier than February 4, 2004, see AI-302-3.]
****************************************************************
From: Martin Dowie
Sent: Wednesday, February 4, 2004 8:21 AM
Is package Ada.Containers.Maps.Strings[ACMS] really what is
intended, as Ada.Containers.Maps[ACM] is generic this means
to use ACMS a user must first instantiate ACM and then
instantiate ACMS.
Charles didn't suffer from this problem as Unbounded maps (~ACM)
and String Maps (~ACMS) were siblings not parent/child.
****************************************************************
From: Matthew Heaney
Sent: Wednesday, February 4, 2004 8:57 AM
>2) For routines like 'Generic_Iteration' shouldn't the 'Process'
> generic subprogram parameter not have a 'Stop : out Boolean'
> parameter? To allow early exit of the iteration, without
> having to raise exceptions?
Just use an active iterator.
****************************************************************
From: Matthew Heaney
Sent: Wednesday, February 4, 2004 9:52 AM
Note that that's not really the correct mode anyway: it should be inout,
not just out, like this:
generic
with procedure Process
(Cursor : in Cursor_Type;
Done : in out Boolean) is <>;
procedure Generic_Iteration (Map : in out Map_Type);
The problem with just out-mode is that you always have to give the
parameter a value. But this is wrong, since you shouldn't be compelled
to say anything if you merely want to continue. You should only have
say something when you want to stop.
If you only want to visit some of the items, then just use an active
iterator, and exit the loop when you need to:
declare
I : Cursor_Type := First (M);
J : constant Cursor_Type := Back (M);
begin
while I /= J loop
declare
E : Element_Type := Element (I);
begin
--do something with E
exit when Predicate (E);
end;
Increment (I);
end loop;
end;
****************************************************************
From: Martin Dowie
Sent: Wednesday, February 4, 2004 10:06 AM
[snip]
> If you only want to visit some of the items, then just use an active
> iterator, and exit the loop when you need to:
[snip]
I could but wasn't part of the purpose of the library to allow us to
do common things more easily? And I'd have to say I'd use a 'Quit'
version a _lot_ more than the current process everything,
every time one.
I'd be delighted if both versions could be included! :-)
****************************************************************
From: Matthew Heaney
Sent: Wednesday, February 4, 2004 11:16 AM
Dowie, Martin (UK) wrote:
> I could but wasn't part of the purpose of the library to allow us to
> do common things more easily? And I'd have to say I'd use a 'Quit'
> version a _lot_ more than the current process everything,
> every time one.
It would be helpful if you could be specific about what kind of
container you were using.
The vector has neither active nor passive iterators, which means that
for a vector you have to use a loop anyway.
For the hashed map, I would find it very odd if you needed to traverse
only some of its elements, since elements are stored in hash order.
What would be the nature of the predicate?
The sorted set is the borderline case.
****************************************************************
From: Peter Hermann
Sent: Wednesday, February 4, 2004 5:57 AM
> package Ada.Strings.Case_Insensitive is
indeed useful.
expected to be overloaded for fixed and (un)bounded strings.
****************************************************************
From: Matthew Heaney
Sent: Wednesday, February 4, 2004 9:02 AM
>Is package Ada.Containers.Maps.Strings[ACMS] really what is
>intended, as Ada.Containers.Maps[ACM] is generic this means
>to use ACMS a user must first instantiate ACM and then
>instantiate ACMS.
That's definitely a bug in the report. The string-key map is not a
child of a generic. Maybe we should do this:
package Ada.Containers.Maps
package Ada.Containers.String_Maps
****************************************************************
From: Marius Amado Alves
Sent: Wednesday, February 4, 2004 9:07 AM
Yes, please change that. There is a steady requirement that a single
instantiation must be enough to get a container.
****************************************************************
From: Pascal Obry
Sent: Wednesday, February 4, 2004 10:10 AM
> The problem with just out-mode is that you always have to give the
> parameter a value. But this is wrong, since you shouldn't be compelled
> to say anything if you merely want to continue. You should only have
> say something when you want to stop.
Agreed, this is the way iterators are designed in the POSIX 1003.5 standard
for example.
****************************************************************
From: Marius Amado Alves
Sent: Wednesday, February 4, 2004 9:02 AM
> 2) For routines like 'Generic_Iteration' shouldn't the 'Process'
> generic subprogram parameter not have a 'Stop : out Boolean'
> parameter? To allow early exit of the iteration, without
> having to raise exceptions?
Indeed some people ban the use of exceptions for control flow. I guess they
are not a majority in the committee. Fortunately ;-)
/* However to take the exception route the exception should be defined.
(Exit/Terminate_Immediately, _Now, _Prematurely?) Or a specification be made
of what exceptions the iterator is guaranteed to propagate. Simply "all"
would do. Maybe this is already there. I'm sorry, I didn't had time to read
the AI fully yet. */
****************************************************************
From: Randy Brukardt
Sent: Wednesday, February 4, 2004 8:55 PM
Marius Amado Alves wrote:
> Sorry for my poor knowledge of ARG procedure.
> Does this step mean the library is secured for Ada 2005?
What it means is that the study committee has issued a report. No more, and
no less. I would hope that we know more after the March ARG meeting, but
there is no guarantee that we'll work on it (we never seem to get to
everything on the agenda - we didn't work on AI-351, Time Ops in San Diego,
for instance).
Primarily, we just "cleaned up" Matt Heaney's proposal. We didn't change (as
opposed to remove) functionality, with the exception of the Map container
(where we reverted to a design more like the one Charles actually uses -
with Matt's input). So the vast majority of design decisions are Matt's --
we'd prefer to avoid design-by-committee.
Martin Dowie wrote:
> Is package Ada.Containers.Maps.Strings[ACMS] really what is
> intended, as Ada.Containers.Maps[ACM] is generic this means
> to use ACMS a user must first instantiate ACM and then
> instantiate ACMS.
Nope, that's clearly a bug. String_Maps ought to be usable by itself (it
doesn't depend on the other package at all). (And this one is my fault, for
not noticing the effect of the change.)
And later, replying to Matt:
>[snip]
>> If you only want to visit some of the items, then just use an active
>> iterator, and exit the loop when you need to:
>[snip]
>I could but wasn't part of the purpose of the library to allow us to
>do common things more easily? And I'd have to say I'd use a 'Quit'
>version a _lot_ more than the current process everything,
>every time one.
My understanding of Matt's design is that you use the passive iterator when
you want to process everything (which is by far the most common), and you
use an active iterator when you want to process part of the items. You might
use an exception to terminate iteration in an error case, but not if you
intended only to process part of the items. (Of course, there is no law
requiring that, so YMMV!)
I hadn't noticed that there is no passive iterator for vectors until Matt
pointed it out last night (about 20 minutes before we released the report!).
Consistency would suggest that there should be one, but note that it is
easier to write an active iterator for a vector than it is to write a
passive one:
for I in First(Vect) .. Last(Vect) loop
-- Do whatever.
end loop;
versus
declare
procedure Process (I : in Index_Subtype) is
begin
-- Do whatever.
end Process;
procedure Do_It_All is new Generic_Iterator (Process);
begin
Do_It_All (Vect);
end;
Besides being longer and harder to read, you have to know or look up the
index subtype for the vector in order to write this. So we reached no
conclusion about that in the 20 minutes we had to think about it.
Marius Amado Alves wrote:
> /* However to take the exception route the exception should be defined.
> (Exit/Terminate_Immediately, _Now, _Prematurely?) Or a specification be
made
> of what exceptions the iterator is guaranteed to propagate. Simply "all"
> would do. Maybe this is already there. I'm sorry, I didn't had time to
read
> the AI fully yet. */
The wording for Generic_Iteration for a Map says:
Generic_Iteration calls Process with a cursor that designates each
node in the Map. Any exceptions raised during Process are propagated.
So it's covered. This is important, because it means that the implementation
must be able to clean itself up (if any is needed) when an exception
propagates - it can't leave the Map in an unstable state.
****************************************************************
From: Jeffrey Carter
Sent: Wednesday, February 4, 2004 8:53 PM
AI-302-03 asks
> Anybody got better wording [for the quality of the String hashing
> function]? Matt was nice enough to ignore these definitions
> completely!
See
P. K. Pearson, "Fast Hashing of Variable-Length Text Strings," Comm.
ACM, 1990 Jun
It describes a "hashing function specifically tailored to
variable-length text strings." It says that "similar strings are not
likely to collide." (An implementation can be found in
PragmARC.Hash_Fast_Variable_Length.) Perhaps you might think this last
quote is "better wording".
The actual algorithm produces 8-bit hash values, which may no longer be
considered adequate, given
> Hash_Type'Modulus shall be at least as large as the smaller of
> System.Max_Binary_Modulus and 2**32.
I have some comments on the proposal:
The proposal has a structure called a "Vector" which is actually a list,
which is a sequence that allows insertions and deletions at any point.
"Vector" refers to a mathematical concept related to matrices to most
software engineers. It may be that the STL refers to lists as vectors,
but I hope we do not have to follow C++'s mistakes.
Further, the proposal requires an inefficient array implementation, and
several of the operations refer to this implementation. I think this is
a mistake. Specify an general, unbounded list and let the implementor
choose the implementation (which could be an array). As the proposal
points out, correctly implementing a general list is not trivial, so it
makes sense for a standard library to provide a list.
Maps and sets also specify a specific implementation.
If the intention is to have an extensible array structure, then I
suggest that they be called Extensible_Arrays.
Vector should have an iterator, in addition to allowing the user to
explicitly iterate over the structure.
> Open issue: This function returns a value that doesn't depend on it's
> parameter. It possibility could be removed in favor of just saying
> Index_Type'Pred(Index_Type'First) appropriately. Committee discussion
> with the original proposal's author was inconclusive.
I'd say that it should be a constant, not a function. The same seems to
hold for First.
Given that Back is defined as Index_Type'Succ (Last (Vector) ), and Last
(Vector) could be Index_Type'Last, there seems to be a problem. There
should be an assertion that Index_Type'Base'Last > Index_Type'Last.
All the problems with Index_Type disappear with a general list, which
would use a cursor.
I would propose that the sort algorithm be made available to users for
normal array types as well as for vectors. That would involve putting it
in its own library unit and refering to that unit in Vectors.
The Map structure is required to be implemented with a hash table. If
we're going to have such a requirement, it should at least be named
Hashed_Maps.
An important thing about maps is that they provide fast searching,
typically based on a lower-level structure such as a hash table or
balanced tree. Such structures have uses of their own in addition to
creating maps, and independent of the key/value concept of a map. For
example, an application may collect a number of values and then need to
quickly determine if a value is in that collection, and a searchable
structure with a Get_First operation can be used for a priority queue.
None of these applications use key/value pairs. Therefore, I think it's
important to provide the underlying searchable structure to users.
(Indeed, given the ease with which a user can wrap a key/value pair in a
record, define comparison operations for that record that only use the
key part, and create a map structure, given the existence of a
searchable structure, it could be argued, since the proposal states that
easily implemented structures should not be part of the library, that
the library should only supply searchable structures, and not maps.)
Do we really need Maps.[Wide_]Strings, given that an Unbounded_String
can be used for the key type, and that this library should not be used
for applications in which the use of Unbounded_Strings is not acceptable?
The Sets package is mostly incomprehensible. Sets deal with elements,
and operations must include testing if an element is in a set, creating
a set from a list of elements (set "literals"), and set union,
intersection, difference, and symmetric difference. Except for the
membership test, these are missing from the package, so I don't see what
it has to do with sets. It appears to be a searchable structure, not a
set. This is corroborated by the package Generic_Keys, which allows the
structure to be used as a map.
The discussion of the package begins by talking about nodes, which is an
undefined term. The reader has no idea what it has to do with the
package, which is not specified in terms of nodes.
"Sans" is a French word. Since the ARM is in English, we should use the
English "without" instead. "No" might also be acceptable.
I'd like to thank the select committee for their work. No library will
completely please everyone. I will welcome any standard container
library in Ada 0X.
****************************************************************
From: Tucker Taft
Sent: Wednesday, February 4, 2004 9:24 PM
The term "vector" for extensible array is used in Java
as well. I think we should strive to use terminology
that has become widely used in the programming community.
I personally consider an extensible array (i.e. a vector) a useful and
important standard container. I don't feel the same way about a linked
list, because it is so easy to implement what you want, and there
are so many options when it comes to how to link the objects
together that having a standard container for that hardly
seems worthwhile (IMHO).
So we settled on Vector, Map, and Set as three basic yet
important abstractions that will help lift the level of
programming above arrays and records. In my experience
with using languages that have large container libraries,
it is these three that are used more widely than all
the others combined.
****************************************************************
From: Randy Brukardt
Sent: Wednesday, February 4, 2004 9:29 PM
I agree with one caveat: we're already adding something else called "Vector"
to the standard (see AI-296), and two might just be too confusing.
But, the container vector is more useful than the list container (because of
the calculated O(1) access to elements). And they're too similar to support
both when we're trying to support something managable.
****************************************************************
From: Randy Brukardt
Sent: Wednesday, February 4, 2004 9:39 PM
Jeffrey Carter said:
...
> Further, the proposal requires an inefficient array implementation, and
> several of the operations refer to this implementation. I think this is
> a mistake. Specify an general, unbounded list and let the implementor
> choose the implementation (which could be an array). As the proposal
> points out, correctly implementing a general list is not trivial, so it
> makes sense for a standard library to provide a list.
>
> Maps and sets also specify a specific implementation.
No, an implementation is suggested (in AARM notes), as are performance
characteristics. That was one of the larger changes to Matt's original
proposal. If we made that change incompletely somewhere, that needs to be
fixed.
That said, the most important thing is that all implementations have
consistent performance characteristics (so that porting a program from GNAT
to ObjectAda doesn't fail for performance reasons). If GNAT used an array
implementation and ObjectAda used a list implementation for a Vector, access
to elements (which would be O(N) on the imagined OA implementation) could be
too slow for the port to be viable. That needs to be avoided. OTOH,
specifying too much about the implementation would prevent using a better
one -- in that case, we might as well just specify the source code of the
entire library (including the bodies!), and we don't need all of this
wording!
> I would propose that the sort algorithm be made available to users for
> normal array types as well as for vectors. That would involve putting it
> in its own library unit and refering to that unit in Vectors.
Bad idea. To do that, you'd need provide generic formal accessor functions;
that would have a huge overhead of function calls for both Vectors and
Arrays. On a code shared implementation like Janus/Ada, it probably would
run ten times slower than the specified one.
If we want an array sort, we should declare one:
generic
type Index_Type is (<>);
type Element_Type is private;
function "<" (Left, Right : Element_Type) return Boolean is <>;
type Array_Type is array (Index_Type) of Element_Type;
procedure Ada.Generic_Sort (Arr : in out Array_Type);
(We'd need an unconstrained version, too.) But keep it separate from the
Vector one (or any List one, for that matter).
****************************************************************
From: Matthew Heaney
Sent: Thursday, February 5, 2004 9:31 AM
I have hosted a reference implementation at my Earthlink home page:
<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040205.zip>
For now it only includes the vector. There's a test_sort program in
there too, so you have something you can run.
I'll have the set and maps done in a few days.
****************************************************************
From: Robart A. Duff
Sent: Thursday, February 5, 2004 10:13 AM
Thanks, Matt!
****************************************************************
From: Jeffery Carter
Sent: Thursday, February 5, 2004 10:58 AM
Randy Brukardt wrote:
> No, an implementation is suggested (in AARM notes), as are performance
> characteristics. That was one of the larger changes to Matt's original
> proposal. If we made that change incompletely somewhere, that needs to be
> fixed.
The normative text for vectors says
"A vector container object manages an unconstrained internal array"
That specifies an array implementation.
> Bad idea. To do that, you'd need provide generic formal accessor functions;
> that would have a huge overhead of function calls for both Vectors and
> Arrays. On a code shared implementation like Janus/Ada, it probably would
> run ten times slower than the specified one.
Given that an array implementation is specified, there is no need for
formal accessor functions. The vector can simply call an instantiation
of the sort with the appropriate slice of its internal array. Since we
require such an algorithm to exist, and it is useful to many users, it
makes sense for it to be available outside the vector package.
> If we want an array sort, we should declare one:
>
> generic
> type Index_Type is (<>);
> type Element_Type is private;
> function "<" (Left, Right : Element_Type) return Boolean is <>;
> type Array_Type is array (Index_Type) of Element_Type;
> procedure Ada.Generic_Sort (Arr : in out Array_Type);
>
> (We'd need an unconstrained version, too.) But keep it separate from the
> Vector one (or any List one, for that matter).
If we only have one, I'd prefer it to be unconstrained. That allows
operations such as the vector sort discussed above, where the size of
the slice may change from call to call, without repeated instantiations.
Sort for a list is a different creature. Merge sort is a good choice
there, since a list already has the O(N) additional space that merge
sort requires for array sorting (in the links), provided you have access
to the list internals. Thus you get O(N log N) time in all cases and
O(1) space.
****************************************************************
From: Randy Brukardt
Sent: Thursday, February 5, 2004 3:23 PM
Jeff Carter wrote:
> The normative text for vectors says
>
> "A vector container object manages an unconstrained internal array"
>
> That specifies an array implementation.
Precisely my point. That is intended to say that there is a logical array in
the container, but not necessarly an actual one. Matt's descriptions were
too implementation-specific, and we moved most of that. But I'm not
surprised that some was missed.
...
> Given that an array implementation is specified, there is no need for
> formal accessor functions. The vector can simply call an instantiation
> of the sort with the appropriate slice of its internal array. Since we
> require such an algorithm to exist, and it is useful to many users, it
> makes sense for it to be available outside the vector package.
There is no intent that an array implementation is specified (it certainly
won't be implemented that way on Janus/Ada); only that the performance
characteristics are similar (or better) than that of an array
implementation.
In any case, I have no idea how an external generic would be able to mess
around with the internal array - it certainly can't see it! You'd have to
put the sort into the spec in order to do that -- and that's whats proposed
and what you're objecting to.
****************************************************************
From: Matthew Heaney
Sent: Thursday, February 5, 2004 3:40 PM
Randy Brukardt wrote:
> Precisely my point. That is intended to say that there is a logical array in
> the container, but not necessarly an actual one.
Yes, exactly. This allows the implementor to leave the type system in
order choose the optimal implementation for the vector container.
An implementor can use any implementation that satisfies the property
that insertion at the back end is (amortized) constant time, and the
property that random access is constant time.
****************************************************************
From: Jeffrey Carter
Sent: Thursday, February 5, 2004 6:52 PM
Randy Brukardt wrote:
>>The normative text for vectors says
>>
>>"A vector container object manages an unconstrained internal array"
>>
>>That specifies an array implementation.
>
> Precisely my point. That is intended to say that there is a logical array in
> the container, but not necessarly an actual one. Matt's descriptions were
> too implementation-specific, and we moved most of that. But I'm not
> surprised that some was missed.
I read it as specifying an implementation. I suggest the wording be
revised to make it clear that the discussion is of a logical array, not
a requirement for an actual array.
> In any case, I have no idea how an external generic would be able to mess
> around with the internal array - it certainly can't see it! You'd have to
> put the sort into the spec in order to do that -- and that's whats proposed
> and what you're objecting to.
I guess I wasn't clear. You would provide the external sort, and also
specify the sort in the spec, with wording that the sort has the same
characteristics as the external sort. This is based on the assumption
that an array implementation is specified, so the sort algorithm, useful
on arrays, must exist anyway.
I'm reminded of my surprise that Ada-83 compilers had to support
inifinte-precision arithmetic, but the language did not require that it
made available to users. If the compiler writers have to implement the
functionality, why not make it available to users? Case-insensitive
string comparison is a similar thing: compilers have to recognize that
frog, Frog, and FROG are the same identifier, but are (were) not
required to make such comparisons available to users.
****************************************************************
From: Jeffrey Carter
Sent: Thursday, February 5, 2004 11:38 AM
Tucker Taft wrote:
> The term "vector" for extensible array is used in Java
> as well. I think we should strive to use terminology
> that has become widely used in the programming community.
I disagree, even though I know that's dangerous when discussing Ada with
STT. An application that uses both extensible arrays and mathematical
vectors will be very confusing if both are called vectors. Since an
explicit design goal of Ada is to emphasize ease of reading, calling an
extensible array a vector seems inappropriate.
> I personally consider an extensible array (i.e. a vector) a useful and
> important standard container. I don't feel the same way about a linked
> list, because it is so easy to implement what you want, and there
> are so many options when it comes to how to link the objects
> together that having a standard container for that hardly
> seems worthwhile (IMHO).
I have no objections to an extensible array, provided it's clearly
identified as such. I think it should look different from the proposal,
but that's mainly a taste issue. I'd want direct analogs to indexing,
both LHS and RHS (Put and Get?); slices, both LHS and RHS (Replace_Slice
and Slice?); and 'First, 'Last, and 'Length (though 'First is a constant
for an EA). An equivalent to 'range would be nice, but impossible. The
only difference to a normal array would be that Put and Replace_Slice
can accept indices not in First .. Last. I haven't given it a great deal
of thought, so I'm sure I'm missing some subtleties, but I don't see a
need for Front, Back, Insert, Delete, and so on.
The proposal says that containers "that are relatively easy to code,
redundant, or rarely used are omitted". It also says that lists are
difficult to implement correctly. Given a list, structures such as
deques, stacks, and especially queues are easy to implement. Since
queues are common structures and not redundant (none of the proposed
containers provides an efficient implementation of a queue), the
proposal itself seems to argue that lists should be provided, since they
are not easy to code correctly, and provide a basis for the user to
easily code queues.
> So we settled on Vector, Map, and Set as three basic yet
> important abstractions that will help lift the level of
> programming above arrays and records. In my experience
> with using languages that have large container libraries,
> it is these three that are used more widely than all
> the others combined.
There was an article by Mills [Harlan D. Mills, Richard C. Linger: Data
Structured Programming: Program Design without Arrays and Pointers. IEEE
Trans. Software Eng. 12(2): 192-197 (1986)] that proposed that
applications only use queues, stacks, and sets (real sets, with union,
intersection, and such operations). It's an interesting concept, and I
agree with the aim of programs using appropriate abstractions and hiding
lower level implementation details, especially use of pointers.
****************************************************************
From: Alexandre E. Kopilovitch
Sent: Thursday, February 5, 2004 9:04 AM
Tucker Taft wrote:
> The term "vector" for extensible array is used in Java
> as well. I think we should strive to use terminology
> that has become widely used in the programming community.
So call it Java_Vector - that will be at least consistent.
Do you think that Java meaning for "vector" is more significant for Ada than
mathematical meaning of this term (which never implied extensibility) ?
Why not call that thing Flexible_Array (after Algol-68, I think) - this name
will directly reflect the essense.
****************************************************************
From: Robert A. Duff
Sent: Thursday, February 5, 2004 1:37 PM
Bill Wulf and other professors at CMU circa late 1970's were using the
term "vector" to mean "array" (not necessarily extensible); that's the
first time *I* heard it. So it's not a Java-ism.
I think this meaning of "vector" derives from the maths meaning,
even if it's not precisely the same thing.
****************************************************************
From: Stephen Leake
Sent: Thursday, February 5, 2004 2:18 PM
Jeffrey Carter <jrcarter@acm.org> writes:
> Tucker Taft wrote:
>
> > The term "vector" for extensible array is used in Java
> > as well. I think we should strive to use terminology
> > that has become widely used in the programming community.
>
> I disagree, even though I know that's dangerous when discussing Ada
> with STT. An application that uses both extensible arrays and
> mathematical vectors will be very confusing if both are called
> vectors. Since an explicit design goal of Ada is to emphasize ease of
> reading, calling an extensible array a vector seems inappropriate.
I agree with Tucker. I have code that uses both Cartesian vectors and
extensible arrays. One is SAL.Math_Double.DOF_3.Cart_Vector_Type, the
other is SAL.Poly.Unbounded_Arrays. Obviously, I have different names
for them, as Carter wants. But if I called them
SAL.Math_Double.DOF_3.Vector and SAL.Poly.Vector, I would have no
chance of confusion. That's what package hierarchies are for.
Since both Java and C++ use the term "vector" for an extensible array,
I think Ada should also. Part of the point of the OY revision is to
make the language more attractive to current users of other languages.
This is an easy way to do that.
> (Replace_Slice and Slice?); and 'First, 'Last, and 'Length (though
> 'First is a constant for an EA).
'First is not constant for SAL.Poly.Unbounded_Arrays; I provide both
Append and Prepend operations. I don't think I've ever used Prepend,
though; it was really just an exercise in what was possible.
> .. I don't see
> a need for Front, Back, Insert, Delete, and so on.
I use Insert and Delete in real applications.
> The proposal says that containers "that are relatively easy to code,
> redundant, or rarely used are omitted". It also says that lists are
> difficult to implement correctly. Given a list, structures such as
> deques, stacks, and especially queues are easy to implement. Since
> queues are common structures and not redundant (none of the proposed
> containers provides an efficient implementation of a queue), the
> proposal itself seems to argue that lists should be provided, since
> they are not easy to code correctly, and provide a basis for the
> user to easily code queues.
I agree. A lists package would be nice.
But I also agree with Tucker, that it is difficult to come up with one
list package that really meets a wide range of needs.
Perhaps one list package, that meets a narrow range of needs, would
still be useful. It would set a style standard for other list packages.
****************************************************************
From: Matthew Heaney
Sent: Thursday, February 5, 2004 2:48 PM
Alexandre E. Kopilovitch wrote:
> So call it Java_Vector - that will be at least consistent.
>
> Do you think that Java meaning for "vector" is more significant for Ada than
> mathematical meaning of this term (which never implied extensibility) ?
>
> Why not call that thing Flexible_Array (after Algol-68, I think) - this name
> will directly reflect the essense.
Tucker T. and Bob D. are both correct: the container is a "vector."
Alexandre K. and Jeff C. are both incorrect. The container is not a
list, not a Java_Vector, not an Extensible_Array, and not a Flexible_Array.
It is a vector. It has the same semantics as the identically-named
container in the STL. The one named "vector."
The container whose name is vector does not have array semantics. There
is no slicing for example.
The container whose name is vector has the following important properties:
o inserting at the back end is amortized constant time
o supports random access of elements, in constant time
Yes, internally a vector is implemented as an array. The Size function
returns the length of this internal array, and Resize can be used to
expand its length.
But it is not an array. It is a container. Whose name is "vector".
Just like the one in the STL.
****************************************************************
From: Alexandre E. Kopilovitch
Sent: Thursday, February 5, 2004 3:46 PM
No problem with all that if another term was chosen. Now, with "vector", this
is name squatting (well, participation in name squatting in Ada case), which
is fully appropriate for Java, somehow understandable for C++, but seems
(still) inappropriate for Ada, especially taking into account that the involved
term belongs to some Ada-friendly domain.
****************************************************************
From: Robert A. Duff
Sent: Thursday, February 5, 2004 3:38 PM
I wrote:
> Bill Wulf and other professors at CMU circa late 1970's were using the
> term "vector" to mean "array" (not necessarily extensible); that's the
> first time *I* heard it. So it's not a Java-ism.
Actually, the meaning was "one-dimensional array". But there was no
implication that they could grow.
> I think this meaning of "vector" derives from the maths meaning,
> even if it's not precisely the same thing.
I mean, what's a vector in 3-space? Basically, a one-dimensional array
of 3 real numbers -- the X, Y, and Z coordinates.
Matt wrote:
> It is a vector. It has the same semantics as the identically-named
> container in the STL. The one named "vector."
This stuff comes from the C++ STL. I think gratuitous differences from
that are unhelpful. (But I admit that I was one of the folks pushing
for "cursor" instead of "iterator".)
> The container whose name is vector does not have array semantics. There
> is no slicing for example.
Well, "no slicing" is hardly fundamental. It could be added, or
programmed by the client.
> The container whose name is vector has the following important properties:
>
> o inserting at the back end is amortized constant time
> o supports random access of elements, in constant time
I think "random access" is the essence of array semantics. After all,
anything you can do with an array you can do with a linked list, and
vice versa -- the only fundamental difference is the efficiency
properties.
****************************************************************
From: Matthew Heaney
Sent: Thursday, February 5, 2004 9:31 AM
Robert A Duff wrote:
> This stuff comes from the C++ STL. I think gratuitous differences from
> that are unhelpful. (But I admit that I was one of the folks pushing
> for "cursor" instead of "iterator".)
Yes. The world has settled on the name "vector." Let's use the terms
everyone else is using, unless we have a good reason not to.
(BTW, that's also why I used the name "Iterator_Type". But I have no
issues with the name "Cursor_Type".)
> I think "random access" is the essence of array semantics. After all,
> anything you can do with an array you can do with a linked list, and
> vice versa -- the only fundamental difference is the efficiency
> properties.
But that's the essence of the argument!
Yes, it's *possible* to seek to specific elements in a linked list, but
I would hardly call that "random access."
If you need fast random access to the elements in a container, and the
number of elements in the container is large, then you can effectively
rule out using a linked list as the container.
Of course you could make the argument the other way. If you need
constant-time insertion of elements at any position, then that
effectively rules out a vector, in favor of a list.
****************************************************************
From: Alexandre E. Kopilovitch
Sent: Thursday, February 5, 2004 3:21 PM
Robert A Duff wrote:
> Bill Wulf and other professors at CMU circa late 1970's were using the
> term "vector" to mean "array" (not necessarily extensible); that's the
> first time *I* heard it.
Yes, CMU always was (as far as I know) primarily engineering educational
facility, and I know well that engineers (not software engineers, but rather
general kind of engineers) often called "vector" any column or row of numbers.
(not bothering themselves with the question how the components of that "vector"
transform with a change of coordinate system). But apparently they never used
this term for arrays of any other objects, and I almost never seen a case
(even in engineering) where "vector" was used for extensible array - except
Java and perhaps some C++ libraries.
A notable exception is APL, in which "vector" is the basic term, and that
"vector" is extensible. But in APL that "vector" is equipped with vast
nomenclature of functions, many of them associated with genuine mathematical
vectors, so the entire balance for the term was acceptable.
> So it's not a Java-ism.
Yes, not exactly - there were other precedents of sloppy usage of this term.
But nevertheless a strong impression remains that it is exactly Java, which
is a real reason, ground and reference for proposing this term for extensible
arrays *now and for Ada0Y*.
> I think this meaning of "vector" derives from the maths meaning,
> even if it's not precisely the same thing.
No, not at all - it lacks the primary mathematical meaning of it, and adds
the primary feature, which meaning is totally non-mathematical (that is, there
is no attempt to bring any mathematical meaning to it... and it will not be
simple, if attempted).
****************************************************************
From: Matthew Heaney
Sent: Thursday, February 5, 2004 5:11 PM
Jeffrey Carter wrote:
> The actual algorithm produces 8-bit hash values, which may no longer be
> considered adequate, given
>
>> Hash_Type'Modulus shall be at least as large as the smaller of
>> System.Max_Binary_Modulus and 2**32.
In Charles I copied the hash function from GNAT. I figure if it's good
enough for Robert Dewar it's good enough for me...
> Vector should have an iterator, in addition to allowing the user to
> explicitly iterate over the structure.
No. Vector iterators are fragile, and hence very error prone.
They are fragile because the (logical) internal array gets thrown away
during expansion, which invalidates the iterator. It's too hard to keep
track of whether a vector iterator is still valid, and most of the time
you end up with a dangling reference.
The STL has vector iterators in order to provide the infrastructure
necessary to support generic algorithms.
In Ada they are not necessary, because you can use locally-declared
subprograms to fit within such a framework.
>> Open issue: This function returns a value that doesn't depend on it's
>> parameter. It possibility could be removed in favor of just saying
>> Index_Type'Pred(Index_Type'First) appropriately. Committee discussion
>> with the original proposal's author was inconclusive.
>
>
> I'd say that it should be a constant, not a function. The same seems to
> hold for First.
Front can probably go away. First is there for consistency with other
containers.
> Given that Back is defined as Index_Type'Succ (Last (Vector) ), and Last
> (Vector) could be Index_Type'Last, there seems to be a problem. There
> should be an assertion that Index_Type'Base'Last > Index_Type'Last.
That's not really possible for generic actual index types such as
Natural or Positive.
We could get rid of the assertion, but this would impact implementors.
That's why it's still an open issue.
In my reference implementation, I don't think the generic actual type
has to have IT'Base'First < IT'First, since internally I use Integer
subtypes for everything.
http://home.earthlink.net/~matthewjheaney/charles/ai302-20040205.zip
> All the problems with Index_Type disappear with a general list, which
> would use a cursor.
The original proposal included list containers, but they were not
included in the subcommittee report, in order to keep the size of the
report more manageable.
> An important thing about maps is that they provide fast searching,
> typically based on a lower-level structure such as a hash table or
> balanced tree.
My original proposal had both sorted and hashed maps, but in order to
keep the subcommittee report small support for sorted maps was removed.
> Such structures have uses of their own in addition to
> creating maps, and independent of the key/value concept of a map. For
> example, an application may collect a number of values and then need to
> quickly determine if a value is in that collection, and a searchable
> structure with a Get_First operation can be used for a priority queue.
That's what the sorted set is for.
> None of these applications use key/value pairs.
So use the sorted set.
> Therefore, I think it's
> important to provide the underlying searchable structure to users.
Just use the sorted set container. If guarantees that searches only
take O (log N) even in the worst case.
> (Indeed, given the ease with which a user can wrap a key/value pair in a
> record, define comparison operations for that record that only use the
> key part, and create a map structure, given the existence of a
> searchable structure, it could be argued, since the proposal states that
> easily implemented structures should not be part of the library, that
> the library should only supply searchable structures, and not maps.)
The (hashed) map stores the key and element as separate components of
the internal node of storage.
If you have a record like that, containing a key-part component, then
use the sorted set, and instantiate the nested generic package Generic_Keys.
> Do we really need Maps.[Wide_]Strings, given that an Unbounded_String
> can be used for the key type, and that this library should not be used
> for applications in which the use of Unbounded_Strings is not acceptable?
Yes, we really need string-key maps.
> The Sets package is mostly incomprehensible. Sets deal with elements,
> and operations must include testing if an element is in a set, creating
> a set from a list of elements (set "literals"), and set union,
> intersection, difference, and symmetric difference. Except for the
> membership test, these are missing from the package, so I don't see what
> it has to do with sets. It appears to be a searchable structure, not a
> set. This is corroborated by the package Generic_Keys, which allows the
> structure to be used as a map.
A "set" is really any sorted sequence of items. If you want set
intersection, symmetric difference, etc, then just use a generic
algorithm. See the Charles library for such algorithms.
Of course, if you want target of a set union operation to be the set
itself, then just use Insert to insert the items.
The subcommittee report has several examples of how sets are used, and
there's at least one example showing how to use the nested generic package.
See the last two slides in my AE-2003 paper presentation for an example
of how to take the union of a set and a (sorted) list:
http://home.earthlink.net/~matthewjheaney/charles/charlesppt.htm
My original proposal has the same example at the very end:
http://home.earthlink.net/~matthewjheaney/charles/ai302.txt
> "Sans" is a French word. Since the ARM is in English, we should use the
> English "without" instead. "No" might also be acceptable.
Je crois que non. C'est une bonne idea.
The name for Delete_Sans_Increment comes from Emacs lisp, which has the
functions file-name-sans-extension and file-name-sans-versions.
It was also in homage to Ada's French history, given that her original
designer was French, and worked for a French company.
Why do you think "rendevous" was named that way?
> I'd like to thank the select committee for their work. No library will
> completely please everyone. I will welcome any standard container
> library in Ada 0X.
If you don't immediately grok how vectors and sets and maps work, then I
suggest familiarizing yourself with the STL. There are lots of tutorials
on the WWW.
I also recommend Stanley Lippman's little book Essential C++. That was
my introduction to the STL, and what originally convinced me that
Stepanov's approach was the correct one.
You might also like Accelerated C++ by Andrew Koenig and Barbara Moo,
which uses the STL as a basis for teaching C++.
****************************************************************
From: Randy Brukardt
Sent: Thursday, February 5, 2004 5:49 PM
Matt's too modest. The tutorial that makes up the !example section is
actually quite good. I learned a lot about how the packages work (and how to
use them) from reading it carefully, and I recommend that everyone do that
to better understand Matt's work.
****************************************************************
From: Randy Brukardt
Sent: Thursday, February 5, 2004 3:48 PM
Jeffrey Carter wrote:
...
> > I personally consider an extensible array (i.e. a vector) a useful and
> > important standard container. I don't feel the same way about a linked
> > list, because it is so easy to implement what you want, and there
> > are so many options when it comes to how to link the objects
> > together that having a standard container for that hardly
> > seems worthwhile (IMHO).
>
> I have no objections to an extensible array, provided it's clearly
> identified as such. I think it should look different from the proposal,
> but that's mainly a taste issue. I'd want direct analogs to indexing,
> both LHS and RHS (Put and Get?); slices, both LHS and RHS (Replace_Slice
> and Slice?); and 'First, 'Last, and 'Length (though 'First is a constant
> for an EA). An equivalent to 'range would be nice, but impossible. The
> only difference to a normal array would be that Put and Replace_Slice
> can accept indices not in First .. Last. I haven't given it a great deal
> of thought, so I'm sure I'm missing some subtleties, but I don't see a
> need for Front, Back, Insert, Delete, and so on.
Let's see:
- direct analogs to indexing, both LHS and RHS (Element, Replace_Element);
- slices (nope);
- 'First (First), 'Last (Last), 'Length (Length);
Looks like pretty much everything is in there. And slicing will be expensive
if the implementation is not a straight array, so it's somewhat dubious.
Insert and Delete provide easier ways of adding or removing items than
slices - and how often do you use a slice of a non-string type for something
other than inserting or deleting elements anyway??
Ada doesn't (and isn't) going to support user-defined indexing or
user-defined attributes, so this is about the best you can do. So what's the
complaint (other than the name)??
> The proposal says that containers "that are relatively easy to code,
> redundant, or rarely used are omitted". It also says that lists are
> difficult to implement correctly.
I think that's a mistake; only very rare operations are difficult to code.
We didn't update every piece of the original text, and that one is
misleading.
> Given a list, structures such as
> deques, stacks, and especially queues are easy to implement. Since
> queues are common structures and not redundant (none of the proposed
> containers provides an efficient implementation of a queue), the
> proposal itself seems to argue that lists should be provided, since they
> are not easy to code correctly, and provide a basis for the user to
> easily code queues.
The user can easily code a queue in terms of a Vector (that's one of the
uses of Insert!). We dropped the list component because it had an identical
interface to the Vector component, but was less flexible (no computed O(1)
access).
In any case efficiency is not a goal of the standard containers. It would be
incorrect for the standard to specify performance to the point that only a
single implementation would be possible. Moreover, we anticipate a secondary
standard that *does* try to provide more control over performance (by adding
lists, bounded forms, etc.)
In my view, it is a mistake for projects to depend on standard containers
where there are critical performance requirements (not just time, but also
space as well). In that case, you really have to have control of the
implementation -- you really need *all* of the source code. You can't trust
something provided by the standard (or your compiler vendor) in those cases.
In any case, the purpose of these containers is to provide a seed and a
standard direction. I would hope that they would reduce the tower of babel
that Ada containers are nowdays - by providing a style for other containers
to follow. No one is suggesting that these are sufficient to solve all
programming problems - just 80% of them, especially in prototypes and in Q&D
programs.
****************************************************************
From: Martin Dowie
Sent: Thursday, February 5, 2004 5:50 PM
> Dowie, Martin (UK) wrote:
> > I could but wasn't part of the purpose of the library to allow us to
> > do common things more easily? And I'd have to say I'd use a 'Quit'
> > version a _lot_ more than the current process everything,
> > every time one.
>
> It would be helpful if you could be specific about what kind of
> container you were using.
I was thinking, primarily, of a project that used single (bounded) lists to
hold commands (a basic, domain-specific, scripting language I guess),
one of which was 'stop this sequence of commands'.
This pattern has since shown itself to be quite common in embedded
systems - for either domain-specific scripting languages or graphics.
There is the other idiom where one is processing an iteration of items
and an external event occurs that stops the processing - e.g. the 'stop'
button is pushed on a GUI-search window, but it could equally be a
50Hz message over a 1553.
****************************************************************
From: Randy Brukardt
Sent: Thursday, February 5, 2004 6:14 PM
> I was thinking, primarily, of a project that used single (bounded) lists to
> hold commands (a basic, domain-specific, scripting language I guess),
> one of which was 'stop this sequence of commands'.
My understanding of the model is that passive iterators are only for cases
where you want to iterate over the entire container. Thus, this is clearly a
use for an active iterator. Indeed, given the iteration model of packages,
there's hardly any reason to use a passive iterator. They're harder to write
(a subprogram and instantiation are required), and (especially if a Quit
parameter is provided), harder to understand.
We dropped the passive iterator from the Ada.Directories package precisely
because even ARG members were confused about how it worked. Even though it
was a classic passive iterator with a Quit parameter. Perhaps the confusion
really was the Quit parameter (I thought it was the whole idea), but in any
case, you've got to keep them simple.
> This pattern has since shown itself to be quite common in embedded
> systems - for either domain-specific scripting languages or graphics.
>
> There is the other idiom where one is processing an iteration of items
> and an external event occurs that stops the processing - e.g. the 'stop'
> button is pushed on a GUI-search window, but it could equally be a
> 50Hz message over a 1553.
It seems to me that an abort situation is best handled by propagating an
exception. Otherwise, you end up distributing termination code/flags
everywhere in the application. But YMMV.
****************************************************************
From: Jeffery Carter
Sent: Thursday, February 5, 2004 6:39 PM
Matthew Heaney wrote:
> Alexandre K. and Jeff C. are both incorrect. The container is not a
> list, not a Java_Vector, not an Extensible_Array, and not a
> Flexible_Array.
Matthew H. is incorrect. The data structure is not a vector.
I am at least as qualified as Matthew H. to make such pronouncements.
****************************************************************
From: Jeffery Carter
Sent: Friday, February 6, 2004 1:05 PM
A comment on type names.
Ada 83, with the unfortunate* exception of File_Type, did not use
"_Type" on the end of predefined type names. We have Address and Count,
not Address_Type and Count_Type. Ada 95 adhered to this principle, so we
have Storage_Element and Unbounded_String, not Storage_Element_Type and
Unbounded_String_Type.
For consistency, I think the Ada-0X process should also adhere to this
principle. The use of "_Type" on type names in the proposal should be
eliminated. This takes some time and thought to do well; I am willing to
volunteer for the effort if the Committee cannot spare the time and
cannot find anyone preferable.
This is a matter of consistently. While it is not my style, and not
recommended by the Quality and Style Guide, I have used libraries that
use the "_Type" convention without problem. I am concerned that the ARM
be consistent far more than I am about what convention the ARM uses.
*"Unfortunate" because it is inconsistent.
****************************************************************
From: Matthew Heaney
Sent: Friday, February 6, 2004 9:33 AM
I have updated the reference implementation, which now has the sorted
set container, too.
There's also a test_sets.adb, so you have something to run. You can
pass a seed on the command line.
<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040206.zip>
I'll take care of the hashed map containers this weekend, and post Mon AM.
****************************************************************
From: Matthew Heaney
Sent: Friday, February 6, 2004 3:36 PM
Martin Dowie wrote:
> I was thinking, primarily, of a project that used single (bounded) lists to
> hold commands (a basic, domain-specific, scripting language I guess),
> one of which was 'stop this sequence of commands'.
It sounds like you have a sequence container, that you traverse from
front to back.
The only sequence container in the proposal is a vector, which doesn't
have a passive iterator. Again, I recommend just using a loop:
for Index in First (V) .. Last (V) loop
declare
Command : Command_Type := Element (V, Index);
begin
exit when Is_Stop (Command);
-- process command
end;
end loop;
If these are commands that have an order (say, each command has a
timestamp, and commands are executed in timestamp order), then you can
use the sorted set. Again, an explicit loop is appropriate:
declare
I : Cursor_Type := First (S);
J : constant Cursor_Type := Back (S);
begin
while I /= J loop
declare
Command : Command_Type := Element (I);
begin
exit when Is_Stop (Command);
-- process command
end;
Increment (I);
end loop;
end;
****************************************************************
From: Alexandre E. Kopilovitch
Sent: Friday, February 6, 2004 4:24 PM
> The only sequence container in the proposal is a vector,
Ah, yes, it's Sequence - quite right name for that container (and not Vector).
****************************************************************
From: Jeffrey Carter
Sent: Friday, February 6, 2004 7:17 PM
Randy Brukardt wrote:
> Let's see:
> - direct analogs to indexing, both LHS and RHS (Element, Replace_Element);
> - slices (nope);
> - 'First (First), 'Last (Last), 'Length (Length);
>
> Looks like pretty much everything is in there. And slicing will be expensive
> if the implementation is not a straight array, so it's somewhat dubious.
> Insert and Delete provide easier ways of adding or removing items than
> slices - and how often do you use a slice of a non-string type for something
> other than inserting or deleting elements anyway??
Slicing isn't included because C++ doesn't have slices, so it's a
foreign concept to its library and users. If we want to attract users of
inferior languages to Ada, it should be because Ada is better. Ada's
slices are a way that Ada is better; Ada's standard extensible array
component should be better than its competition by also offering them. I
do not see mimicking C++'s shortcomings as advisable.
Insertion and deletion are basic operations of lists, but not of arrays.
That's why the list and vector components had the same set of
operations: they both specify lists with different implementations.
Since String is an array, and [Un]Bounded_String is an extensible array,
and we're now told the correct name is Vector, shouldn't these be
renamed to something like Character_Vector?
> Ada doesn't (and isn't) going to support user-defined indexing or
> user-defined attributes, so this is about the best you can do. So what's the
> complaint (other than the name)??
I don't expect user-defined indexing, slices, or attributes, which is
why I talked about "analogs" to them. Missing slices is one complaint.
And, yes, the name is unarguably wrong.
In the C family of languages, users are accustomed to having to look at
implementations in order to understand how to use something. Subprogram
"prototypes" (yet another misused term to add to the collection) are
generally insufficient, and appropriate comments are often lacking. So
it comes as no surprise to me that C++ expects newcomers to its library,
looking for an extensible array, and not finding anothing with an
appropriate name, to have to look at the operations of the components to
find that the inappropriately named "vector" is really an extensible array.
However, this is not the Ada way, and I think it completely
inappropriate to mimick this mistake. Looking at other languages'
library to select useful components is fine; insisting that an Ada
version must be identical to that of another language, including
mistakes, is not.
> The user can easily code a queue in terms of a Vector (that's one of the
> uses of Insert!). We dropped the list component because it had an identical
> interface to the Vector component, but was less flexible (no computed O(1)
> access).
The perfomance of a queue based on an extensible array is likely to be
just as objectionable as extracting an element from an extensible array
based on a list. That the vector and list components both had the same
interface is further evidence that mimicking the STL is a bad idea.
Insert and delete are as foreign to an extensible array as indexing and
slicing should be to a list.
> In my view, it is a mistake for projects to depend on standard containers
> where there are critical performance requirements (not just time, but also
> space as well). In that case, you really have to have control of the
> implementation -- you really need *all* of the source code. You can't trust
> something provided by the standard (or your compiler vendor) in those cases.
I agree. That doesn't mean that the standard shouldn't provide a basis
for queues with performance characteristics suitable for performance
non-critical applications, which an extensible array does not provide.
****************************************************************
From: Randy Brukardt
Sent: Friday, February 6, 2004 8:24 PM
Jeff Carter wrote:
...
> I agree. That doesn't mean that the standard shouldn't provide a basis
> for queues with performance characteristics suitable for performance
> non-critical applications, which an extensible array does not provide.
Huh? You've said, in effect, that the performance isn't good enough for
applications where the performance doesn't matter. That's a pretty goofy
statement!
My opinion has not changed: if you care about performance *at all*, you
*cannot* depend on *any* standard containers. But usually the performance
does not matter at all (or so little as to be equivalent to not at all): the
number of elements in the container is small (which would be true for
virtually all queues), and/or it is used infrequently, and/or the
application is a throw-away.
Otherwise, if you are writing portable code, you shouldn't use a predefined
container library at all -- the performance is likely to vary much more
across implementations than code you write yourself. For instance, on
Janus/Ada, any generic list container is going run 2-5 times slower than the
same list created yourself -- that's just the effect of the extra call
overhead and the shared body (which means the elements will be dynamically
allocated - separately - in any case - at least doubling the allocation
overhead). I'd expeect that effect to be much less on GNAT, for example,
because they don't share generic bodies and thus don't have the double
allocation overhead.
If your application doesn't care about the component being 5 times slower,
then it is highly unlikely that it is going to care about whether the
Vector/Sequence/List component is implemented as an array, as a list, as a
tree, as a hash table, or as something else.
My preference with these components would be to say absolutely nothing about
performance or implementation (because anything said is as meaningless as
real-time metrics are). But others believe that that would cause real
portability problems, and I'm willing to go along with that.
The problem I see is a lot of people are looking far too closely at tiny
pieces of abstractions. You might have a queue or a list as part of a large
abstraction, but they're pretty much useless by themselves. And given that
creating a queue or stack (both of which have only two operations, both
trivial!) would take 3 minutes max, it makes no sense to use a complex (and
necessarily slow) container library for just that -- indeed, it probably
would be more work to use a container than the 3 minutes.
I much prefer the vision of this containers library, where the only
containers included are those that are large, complex, multi-purpose, and
have a clear abstraction.
****************************************************************
From: Jeffrey Carter
Sent: Friday, February 6, 2004 7:39 PM
Matthew Heaney wrote:
> No. Vector iterators are fragile, and hence very error prone.
Modifying a structure from an iterator should be a bounded error.
> They are fragile because the (logical) internal array gets thrown away
> during expansion, which invalidates the iterator. It's too hard to keep
> track of whether a vector iterator is still valid, and most of the time
> you end up with a dangling reference.
You can only talk about what happens internally during an operation if a
specific implementation is required, which Randy assures us is not the case.
> A "set" is really any sorted sequence of items. If you want set
> intersection, symmetric difference, etc, then just use a generic
> algorithm. See the Charles library for such algorithms.
I've used sets for decades, in discrete math, in specification languages
such as Z, and in programming. A set is an unordered collection of
elements from a universe that provides operations such as membership,
union, intersection, and the like, represented by mathematical symbols
that I can't reliably represent in an e-mail.
An implementation of a set may be sorted to speed up operations, but
that's a feature of the implementation, not of the concept implemented.
That's a distinction that many users of C-family languages seem unable
to make, but that I expect from those who embrace Ada.
> The name for Delete_Sans_Increment comes from Emacs lisp, which has the
> functions file-name-sans-extension and file-name-sans-versions.
Yet another case of mimicking others' errors.
> It was also in homage to Ada's French history, given that her original
> designer was French, and worked for a French company.
>
> Why do you think "rendevous" was named that way?
"Rendezvous" is not a predefined indentifier in the ARM. It was chosen
because no English word has the precise meaning intended, and Ada's
designers understood the importance of precise terminology.
> If you don't immediately grok how vectors and sets and maps work, then I
> suggest familiarizing yourself with the STL. There are lots of tutorials
> on the WWW.
I've been using arrays, including extensible arrays, sets, and maps for
decades. I've also been using vectors for decades, having done a lot of
scientific programming that required matrix math. I doubt that a study
of C++ mistakes would have any effect besides raising my blood pressure.
****************************************************************
From: Jeffrey Carter
Sent: Friday, February 6, 2004 7:22 PM
Randy Brukardt wrote:
> Precisely my point. That is intended to say that there is a logical array in
> the container, but not necessarly an actual one. Matt's descriptions were
> too implementation-specific, and we moved most of that. But I'm not
> surprised that some was missed.
On closer inspection, the Size and Resize operations certainly imply an
array implementation; they are meaningless otherwise.
****************************************************************
From: Randy Brukardt
Sent: Friday, February 6, 2004 9:09 PM
Huh? Resize tells the container a reasonable size to use; what the container
does with that information is up to it. Size simply returns that
information.
That's no different than many of the attributes in Ada, which (if set),
always return the values that they were set to. But what the compiler does
with those values is (almost) completely implementation-defined.
The only real requirement here is O(1) element access (which prevents the
use of a straight linked list).
Janus/Ada will probably use an array of pointers (or possibly array of
arrays of pointers); we're going to be (implicitly) allocating the elements
anyway, we might as well do it explicitly and take advantage of that to make
Insert/Delete/Sort (and any expansions) much cheaper (presuming the elements
are bigger than scalar types). An array of arrays of pointers is even
better, because insertion cost is bounded by the maximum size of an array
chunk -- but there is more overhead and complexity, so I'd like to see some
real uses before deciding on an implementation.
Note that a pure list component has no real opportunity for "better"
implementations, and indeed, any implementation on Janus/Ada would suffer
from "double" allocation.
****************************************************************
From: Martin Dowie
Sent: Saturday, February 7, 2004 4:02 AM
> We dropped the passive iterator from the Ada.Directories package precisely
> because even ARG members were confused about how it worked. Even though it
> was a classic passive iterator with a Quit parameter. Perhaps the confusion
> really was the Quit parameter (I thought it was the whole idea), but in any
> case, you've got to keep them simple.
I didn't find it confusing so I provided an extra child
Ada.Directories.Iterate - and I've used it repeatedly!
> > This pattern has since shown itself to be quite common in embedded
> > systems - for either domain-specific scripting languages or graphics.
> >
> > There is the other idiom where one is processing an iteration of items
> > and an external event occurs that stops the processing - e.g. the 'stop'
> > button is pushed on a GUI-search window, but it could equally be a
> > 50Hz message over a 1553.
>
> It seems to me that an abort situation is best handled by propagating an
> exception. Otherwise, you end up distributing termination code/flags
> everywhere in the application. But YMMV.
I have tended to work in deeply enbedded systems, where exceptions (in
any language!) are at best frowned upon and quite often forbidden! :-(
****************************************************************
From: Martin Dowie
Sent: Saturday, February 7, 2004 4:25 AM
> > I was thinking, primarily, of a project that used single (bounded) lists to
> > hold commands (a basic, domain-specific, scripting language I guess),
> > one of which was 'stop this sequence of commands'.
>
> It sounds like you have a sequence container, that you traverse from
> front to back.
Pretty much, although we also read in where each 'First' is as the whole
contained many 'subroutines'.
> The only sequence container in the proposal is a vector, which doesn't
> have a passive iterator. Again, I recommend just using a loop:
I suspect the first thing I will do is add an extra child generic subprogram
Ada.Containers.Vectors.Iterate! :-)
****************************************************************
From: Martin Krischik
Sent: Saturday, February 7, 2004 6:16 AM
> I suspect the first thing I will do is add an extra child generic
> subprogram Ada.Containers.Vectors.Iterate! :-)
Well, guess don't use GNAT. GNAT gets quite upset if you try to add something
to the Ada packages.
****************************************************************
From: Marius Amado Alves
Sent: Saturday, February 7, 2004 7:45 PM
I'd expect *any* compiler to get really upset with this ;-)
****************************************************************
From: Martin Dowie
Sent: Sunday, February 8, 2004 2:08 AM
"gcc -gnatg" or "gnatmake -a" will stop any warnings :-)
****************************************************************
From: Martin Krischik
Sent: Saturday, February 7, 2004 5:09 AM
> Jeffrey Carter wrote:
> > Given a list, structures such as
> > deques, stacks, and especially queues are easy to implement. Since
> > queues are common structures and not redundant (none of the proposed
> > containers provides an efficient implementation of a queue), the
> > proposal itself seems to argue that lists should be provided, since they
> > are not easy to code correctly, and provide a basis for the user to
> > easily code queues.
> The user can easily code a queue in terms of a Vector (that's one of the
> uses of Insert!). We dropped the list component because it had an identical
> interface to the Vector component, but was less flexible (no computed O(1)
> access).
True enough. But if you wanted a build generic queue on top of the vector the
tag should not be hidden from view. Otherwise one need to repeat all the
access methods instead of just renaming the one provided from the parent
package.
In fact the hidden tag is the one feature which I realey dislike in charles.
****************************************************************
From: Stephen Leake
Sent: Saturday, February 7, 2004 8:40 AM
"Randy Brukardt" <randy@rrsoftware.com> writes:
> Report of the ARG Select Committee on Containers
> February 3, 2004
Thanks for the committee's hard work on this.
What is the rationale for making the Map Key_Type definite, as opposed
to indefinite? Since an indefinite Key_Type is required for
Containers.Maps.Strings, why not make that capability available to the
users?
I don't see a discussion of this in AI-302-03/01.
Another point: Containers.Vectors.Size should return Index_Type'Base,
and the Size parameter in Resize should also be Index_Type'Base. It's
confusing to have different types for Size and Index.
There's also a problem if Natural'Last < Index_Type'Last; you
can't have a vector that contains every index!
****************************************************************
From: Randy Brukardt
Sent: Saturday, February 7, 2004 6:03 PM
> What is the rationale for making the Map Key_Type definite, as opposed
> to indefinite?
The 'committee' primarily adopted the existing proposal submitted by Matt
Heaney. We decided not to change any of the major design decisions of that
proposal - because no package will suit everyone or every need, and we felt
it was more important to standardize something coherently designed for most
needs than to fiddle endlessly with it and risk introducing serious bugs.
Which is to say, I don't know. :-)
> Since an indefinite Key_Type is required for
> Containers.Maps.Strings, why not make that capability available to the
> users?
We definitely expect that the strings container will use a purpose-built
data structure for storing strings, not some general indefinite item
capability. Ways to compactly and efficiently store sets of varying size
strings are well known and commonly used.
Such algorithms could be extended to a general "unconstrained array of
elementary", but that hardly seems to be a worthwhile definition for keys.
...
> Another point: Containers.Vectors.Size should return Index_Type'Base,
> and the Size parameter in Resize should also be Index_Type'Base. It's
> confusing to have different types for Size and Index.
>
> There's also a problem if Natural'Last < Index_Type'Last; you
> can't have a vector that contains every index!
Yes, that's a serious problem on Janus/Ada (Integer is 16-bit). However, you
want the Size and Resize operations to take a numeric type that contains
zero -- and certainly Index_Type is not that. Index_Type could be a subtype
of an enumeration type or a subtype of a modular type (neither of which can
contain zero) or a subtype of an integer type not containing zero.
We had a short, inconclusive discussion about whether the index type ought
to be range <> rather than (<>) (because enumeration and modular types fail
the assertion and thus aren't directly usable), but that still doesn't
guarantee a zero. Moreover, if the integer type has negative numbers, then
the Length of the vector could be larger than Index_Type'Last.
So I don't see a great solution. I wondered about using "Hash_Type" here (it
has the correct properties), but that seems like a misuse of the type (and a
bad idea in a library that most Ada programmers will read - you want to show
them good style in standard libraries).
****************************************************************
From: Martin Krischik
Sent: Saturday, February 7, 2004 5:15 AM
> The perfomance of a queue based on an extensible array is likely to be
> just as objectionable as extracting an element from an extensible array
> based on a list. That the vector and list components both had the same
> interface is further evidence that mimicking the STL is a bad idea.
> Insert and delete are as foreign to an extensible array as indexing and
> slicing should be to a list.
Well, depends. Most queues are not supposed to grow indefinetly so an using a
vector with an modular type as index will give you good perfomace. Every Ada
tutorial contains a expample on how to do it.
****************************************************************
From: Martin Krischik
Sent: Saturday, February 7, 2004 6:14 AM
> The committee selected the second proposal as a starting point for a
> standard containers library, with a number of simple changes. The
> changes were simple enough that we produced a version of the library with
> the changes made (AI-00302-3/01).
Any place where I can actualy read the draft?
Anyway, looking at the reference impementation vom Matthew Heaney (thanks for
the quick responce) I have an improvements to suggest:
type Element_Type is private;
I said this bevore that is too limiting. With that signature you can't even
store strings. And more important you cant store Element'Class. In fact I
predict that with that signature 80% of all data stored will be "access to
something".
I have often heard Ada does not need garbage collection since a good container
library should take care of memory management - and now I ready to follow
that point. But taking that argument, vector is not a good container.
Since vector will need heap storrage anyway and performace is only a minor
issue I suggest:
type Element_Type (<>) is private;
****************************************************************
From: Randy Brukardt
Sent: Saturday, February 7, 2004 6:05 PM
> Any place where I can actualy read the draft?
The same place that you can read any other AI: www.ada-auth.org.
****************************************************************
From: Martin Krischik
Sent: Sunday, February 8, 2004 4:58 AM
I looked there but I only found a very long discussion but not the
actual concluding decision.
****************************************************************
From: Randy Brukardt
Sent: Monday, February 9, 2004 6:03 PM
Don't know what you're looking for, but certainly the entire AI is posted
there. As with all AIs, the !wording section is what goes into the standard.
****************************************************************
From: Martin Krischik
Sent: Saturday, February 7, 2004 6:24 AM
> > The only sequence container in the proposal is a vector,
>
> Ah, yes, it's Sequence - quite right name for that container (and not
> Vector).
No, in my book elements in a Sequence have only a relative positions, or at
least the relative position is the primary position and absolut position is
only the secondary.
That is: Get_Next (V); is faster or as fast as Get (V, 5);
****************************************************************
From: Martin Krischik
Sent: Saturday, February 7, 2004 6:32 AM
> My understanding of the model is that passive iterators are only for cases
> where you want to iterate over the entire container.
Yes.
> Indeed, given the iteration model of packages,
> there's hardly any reason to use a passive iterator.
Passive Iterators should allways provide the fastes mean to iterate over the
hole container. They should do so by knowing the internals of the container.
Of course it only matters in advanced container with B-Trees or AVL-Trees as
as internal structure. But I have only seen those in IBM's Open Class Library
(which is far better the the STL).
But there are no advanced containers in AI 302.
****************************************************************
From: Randy Brukardt
Sent: Saturday, February 7, 2004 6:21 PM
> Passive Iterators should allways provide the fastes mean to iterate over the
> hole container. They should do so by knowing the internals of the
> container.
That might be true in a language with a built-in iterator construct, but it
is certainly not true in Ada because of the overhead of calling the generic
formal subprogram for each element. In Janus/Ada, the overhead of calling a
formal subprogram is at least double of a normal subprogram (we have to save
and restore display information, because you could be calling into a more
nested scope than the generic body -- something that normally isn't possible
in Ada).
Other compilers may not have that overhead, but they'll certainly have call
overhead. Whereas, the explicit loop iterator for Vectors only needs to call
Element. So the call overhead is at best a wash, and at worst much worse for
the passive iterator. Moreover, the compiler is a lot more likely to be able
to in-line the call to Element (which likely has a pretty simple
implementation and thus will meet the in-lining qualifications), than the
bunch of arbitrary code in the Process formal routine.
So, a passive iterator will only be faster in complex containers (where you
have to separate the Element and Successor functions). For a Vector (where
the language already has the needed iteration mechanism built-in), it's
going to be slower (or, if you're really lucky, the same speed) and it
certainly is a lot harder to write.
So I think having it on Vector would simply be for consistency; you'd never
actually use it if you know you're dealing with a Vector.
****************************************************************
From: Robert A. Duff
Sent: Saturday, February 7, 2004 7:22 PM
> Other compilers may not have that overhead, but they'll certainly have call
> overhead. Whereas, the explicit loop iterator for Vectors only needs to call
> Element. So the call overhead is at best a wash, and at worst much worse for
> the passive iterator. Moreover, the compiler is a lot more likely to be able
> to in-line the call to Element (which likely has a pretty simple
> implementation and thus will meet the in-lining qualifications), than the
> bunch of arbitrary code in the Process formal routine.
I don't see why the compiler shouldn't inline the Process routine,
assuming the compiler isn't doing shared generics. They're usually
small, but anyway, the Process routine is typically called exactly
once, so it shouldn't matter how big it is.
****************************************************************
From: Randy Brukardt
Sent: Saturday, February 7, 2004 7:33 PM
Most compilers have limitations on what can be inlined; Process (which
contains arbitrary code) is far more likely to violate one of those
limitations than Element (which never changes and is likely to be very
simple). In addition, many compilers only inline when you give pragma
Inline, and you can't do that on a generic formal.
****************************************************************
From: Robert A. Duff
Sent: Saturday, February 7, 2004 7:43 PM
If Process violates whatever these arbitrary restrictions are, then
sure, you can't get it inlined. But typically Process is very simple --
often just one line of code that calls some other procedure to do the
real work, passing some additional parameters. Process isn't a "real"
procedure, conceptually -- it's just the body of a loop.
In my current project, we make heavy use of the generic iterator
pattern, and I think that in many many cases, Process is just
a line or two of code. (And if it's more, inlining is relatively
less important.)
>... In addition, many compilers only inline when you give pragma
> Inline, and you can't do that on a generic formal.
You give the inline on the actual. In non-sharing implementations,
that should apply inside the instance. And the iterator procedure
itself can be inlined, too.
****************************************************************
From: Randy Brukardt
Sent: Saturday, February 7, 2004 8:04 PM
Certainly it's not real (which is one thing I dislike about passive
iterators in Ada - but we've discussed that before), but if it is very short
(or the bodies of your loops are typically very short), then you're
programming style must be very different from mine. The only loops that I
write that are very short are those that I probably shouldn't have written
in the first place (like the one finding the last '.' in a string) --
there's a routine somewhere in Ada.Strings that will do the job, but looking
it up is more work than writing the loop. (And a lot of them would be
replaced by a Vector/List/Sequence container if I had one.)
But just looking at the spam filter I'm working on at this moment: The
average loop length is about 25 lines, the mean is around 8 lines. (There
are more short loops than I would have guessed. But most of them wouldn't
exist if I had a container to use instead - most of them are insert-at-end
or delete-specific-item from a list.)
...
> You give the inline on the actual. In non-sharing implementations,
> that should apply inside the instance. And the iterator procedure
> itself can be inlined, too.
At which point, you *equal* the performance of the active iterator. And only
if *everything* goes right. The OP claimed that the passive iterator would
always have better performance, and that's certainly not true for the vector
container. I doubt that it would be true for the Map container, either. It
could be true for a complex container, but those aren't commonly used.
****************************************************************
From: Alexandre E. Kopilovitch
Sent: Saturday, February 7, 2004 7:55 PM
Martin Krischik wrote:
> > > The only sequence container in the proposal is a vector,
> >
> > Ah, yes, it's Sequence - quite right name for that container (and not Vector).
>
> No, in my book elements in a Sequence have only a relative positions, or at
> least the relative position is the primary position and absolut position is
> only the secondary.
I don't know in which domain your book was grown up, but I can assure you that
in mathematics (and by extension in physics and other natural sciences as they
use mathematical apparatus) elements of a sequence are commonly indexed, and
those indices are always treated as absolute position (which may be zero or
even negative). By the way, your book is also certainly not from Biology/Genetics,
where term "sequence" is used heavily, and they often speak about both absolute
and relative positions in sequences.
We have clearly different usage of terms "vector" and "sequence": substantial
part of today's software engineering (tools and books) use them one way, while
mathematics (and all natural sciences that use it heavily) always use them another
way.
So all the argument here about Vector/Sequence here is about Ada's choice of
preference: will Ada choose software engineering (effectively, Java and C++
libraries) side or mathematical/scientific side on this issue.
I suppose (or hope) that the thesis "Ada is for problem space, not for solution
space" implies the latter.
****************************************************************
From: Martin Krischik
Sent: Sunday, February 8, 2004 11:40 AM
> I don't know in which domain your book was grown up, but I can assure you
It's the english dictornary: "Aufeinanderfolge, Reihenfolge, Szene,
Zeitfolge". Ah, you don't speak german. Well let's look for "Reihenfolge" in
a rushian dictornary (and have a fight with my wives rushian keyboard):
"???????????".
Asking my wives what it means she said "one after the other, queue".
> that in mathematics (and by extension in physics and other natural sciences
> as they use mathematical apparatus) elements of a sequence are commonly
> indexed, and those indices are always treated as absolute position (which
> may be zero or even negative). By the way, your book is also certainly not
> from Biology/Genetics, where term "sequence" is used heavily, and they
> often speak about both absolute and relative positions in sequences.
I have spend 4 years in Great Britain I am shure if I ask anyone on the street
there "what is a sequence" he or she will answer somthing like "one after the
other" - and that is relativ positioning.
> We have clearly different usage of terms "vector" and "sequence":
> substantial part of today's software engineering (tools and books) use them
> one way, while mathematics (and all natural sciences that use it heavily)
> always use them another way.
Even when it comes done to software engineering: IBM's Open Class Library has
a Sequence - for relativ positioning getFirst, getNext, insertAfter. Usualy
used to fill listboxes.
> So all the argument here about Vector/Sequence here is about Ada's choice
> of preference: will Ada choose software engineering (effectively, Java and
> C++ libraries) side or mathematical/scientific side on this issue.
I don't like the STL that much. So I am not realy defending "vector".
> I suppose (or hope) that the thesis "Ada is for problem space, not for
> solution space" implies the latter.
I agree with you on that too.
But I think we are off topic here.
****************************************************************
From: Marius Amado Alves
Sent: Saturday, February 7, 2004 8:41 PM
Randy Brukardt wrote:
>The 'committee' primarily adopted the existing proposal submitted by Matt
>Heaney. We decided not to change any of the major design decisions of that
>proposal - because no package will suit everyone or every need, and we felt
>it was more important to standardize something coherently designed for most
>needs than to fiddle endlessly with it and risk introducing serious bugs.
>
>Which is to say, I don't know. :-)
I do: there is none (except perhaps the implicit one: ease of
implementation). On the other hand, there is a rationale for indefinite
elements. This requirement has been largely felt and voiced since ever,
and I included it in my Bases document (I think stored in alternative
1), and even formulated it as an Annex (stored in alternative 2 but
applicable to any alternative). But I've always seemed to feel some
resistance from Matt and the ARG. Which resistance I find inexplicable.
I really don't see how making the element type indefinite may
"compromise coherence" or "introduce bugs". Sure it complicates the
implementation. But the increase in power for the user is a quantum
leap, as it frees him from doing tricky memory management in many
situations. In my proposed Annex I included this passage from someone
who should be dear to at least one person in that group--perhaps in the
hope of making those strange walls of resistance just shiver a bit:
<<If I ask a student whether her design is as good as Chartres, she often smiles tolerantly
at me as if to say, "Of course not, that isnt't what I am trying to do.... I could never do
that." Then, I express my disagreement, and tell her: "That standard *must* be our
standard. If you are going to be a builder, no other standard is worthwhile.">>
-- Cristopher Alexander, Foreword to [Gabriel 1996]
****************************************************************
From: Randy Brukardt
Sent: Saturday, February 7, 2004 9:20 PM
> I do: there is none (except perhaps the implicit one: ease of
> implementation). On the other hand, there is a rationale for indefinite
> elements.
Perhaps. But that wasn't the question. The question was why aren't there
indefinite *keys*.
...
> But I've always seemed to feel some
> resistance from Matt and the ARG.
Given that the "ARG" (other than the subcommittee) has not yet looked at
these proposals, that's a pretty bizarre statement.
...
> I really don't see how making the element type indefinite may
> "compromise coherence" or "introduce bugs". Sure it complicates the
> implementation.
And, on most implementations, I would expect it to make it *many* times
slower. (It wouldn't have any effect on Janus/Ada, I don't think, because we
already have to allocate an element at a time anyway.) I would guess that it
is that efficiency concern that Matt is responding to. But I'll let him
respond himself...
****************************************************************
From: Marius Amado Alves
Sent: Sunday, February 8, 2004 6:26 AM
>... that wasn't the question. The question was why aren't there
>indefinite *keys*.
>
Oops... sorry.
Curiously enough if you have indefinite elements the requirement for
indefinite keys looses strength: you can then use elementary containers
or indefinite element positions as keys.
>...
>
>>But I've always seemed to feel some
>>resistance from Matt and the ARG.
>
>Given that the "ARG" (other than the subcommittee) has not yet looked at
>these proposals, that's a pretty bizarre statement.
Just a feeling. The proposals are there in the AI, and there was some
discussion.
>>I really don't see how making the element type indefinite may
>>"compromise coherence" or "introduce bugs". Sure it complicates the
>>implementation.
>
>And, on most implementations, I would expect it to make it *many* times
>slower....
No. The system should chose at compile time a specific body according to
the 'Definite attribute of the actual element type.
Aside. Of course there is still no standard means to do this, but it
would be a nice extension. Conditional compilation of generic bodies
based on instantiation properties. Variant units :-)
generic
type T is private;
...
package G is
when T'Definite =>
...;
when others =>
...;
end;
(On the subject of conditional compilation, see also the recent Ada
Preprocessor thread on CLA.)
In the meanwhile, there is no requirement that Ada.Containers be
implemented strictly in Ada, is there? I doubt any Ada 95 container
(arrays, files) is.
End of aside.
So no coherence problem, nor bugs, nor efficiency problem :-)
****************************************************************
From: Tucker Taft
Sent: Sunday, February 8, 2004 7:33 AM
I suggest the use of controlled types if you want implicit
levels of indirection in the keys or the elements. Having the
container worry about storage management issues relating to elements
or keys significantly increases their complexity. We very much
want these containers to be straightforward to define and use.
They are definitely not the final answer, but more the initial
answer -- the 20% that can handle 80% of the problems.
****************************************************************
From: Marius Amado Alves
Sent: Sunday, February 8, 2004 12:23 PM
>I suggest the use of controlled types if you want implicit
>levels of indirection in the keys or the elements.
That is exactly the problem. The user is forced to control. Waste of
time. And bug prone. The right controlled behaviour is very hard to get.
How many times is Finalize called?
> Having the
>container worry about storage management issues relating to elements
>or keys significantly increases their complexity.
If you mean inneficiency, no, at least not significantly: see the
variant unit solution. If you mean source code complexity, sure, a bit,
but so what?
> We very much
>want these containers to be straightforward to define and use.
>They are definitely not the final answer, but more the initial
>answer -- the 20% that can handle 80% of the problems.
With only definite elements I don't believe in the 80% figure. Just
think: don't you need heterogeneous arrays all the time? For class-wide
programming for example? And logical records? And words, texts,
pictures, all sort of variable length stuff?
BTW this is the kind of "resistance" I was talking about. No technical
arguments really. Just a vague downsize whish. The pointer tradition maybe.
****************************************************************
From: Marius Amado Alves
Sent: Sunday, February 8, 2004 12:41 PM
Just to make some things clear. I began championing indefinite elements
long ago. Wrote the proposals. They met the "resistance". I let it be. I
assumed the proposals had been viewed and were rejected. The recent
discussion made me wonder if the proposals had really been seen. So I
stepped in just to make sure. I don't want to discuss the issue itself.
That has been done. See the proposals (my Bases document stored in
alternative 1, my proposed Annexes in alternative 2, discussions in
ASCLWG, CLA and here). When I say I won't rediscuss the issue it doesn't
mean I won't give focused explanations here. I'll be glad to do it.
Thanks a lot.
****************************************************************
From: Tucker Taft
Sent: Sunday, February 8, 2004 4:25 PM
> ...
> > We very much
> >want these containers to be straightforward to define and use.
> >They are definitely not the final answer, but more the initial
> >answer -- the 20% that can handle 80% of the problems.
> >
> With only definite elements I don't believe in the 80% figure. Just
> think: don't you need heterogeneous arrays all the time? For class-wide
> programming for example? And logical records? And words, texts,
> pictures, all sort of variable length stuff?
But in almost all of these cases, I would not want to be copying
these large objects around. I (as a user of the abstraction) would want
to control storage allocation of the objects. That would imply
I would be using access types explicitly, or define an abstraction
which used a controlled type, with perhaps reference counting
of a pointed-to part.
> BTW this is the kind of "resistance" I was talking about. No technical
> arguments really. Just a vague downsize whish. The pointer tradition maybe.
Sorry if my arguments seem vague. I would be happy to engage
in a long discussion about this design choice. I would want
the container to take over storage allocation only in the
case where it is "uniquifying" the objects, and I expect
to "leave" the objects in the container indefinitely, and pass
around keys (essentially pointers or ids) for the objects.
The example of the "string table" comes to mind, where in
a word or language processing tool, the first thing you do
is uniquify all the strings, and then only deal with indices
into the string table thereafter. This sort of table generally
never goes away, and just grows slowly as new unique strings occur.
The string mapping was included precisely for this application,
as it seems important and common. However, for other cases, we
felt it was better to let the programmer control storage allocation,
so that the amount of allocation, copying, and deallocation of large,
variable-sized objects could be minimized, and most importantly,
under control of the user.
Please don't confuse "resistance" with simply a difference of
opinion. We spend long hours debating incredible minutiae
in the ARG meetings. We rarely take the "easy" route.
We may not document our discussions publically as well as we
should, but rest assured we have a vigorous debate.
The minutes of ARG meetings, which tend to be very good relative
to most minutes I have seen, are nevertheless able to document
only the "tip of the iceberg" of the discussion.
****************************************************************
From: Marius Amado Alves
Sent: Sunday, February 8, 2004 6:55 PM
Thanks for taking the trouble to review this issue. I'll try to summarize:
You feel the user want to control allocation himself. Sometimes, yes. In
those times, he just does it. The indefinite element feature won't stand
on its way. I feel most of the times the user does NOT want to bother
with memory management. He will love to have indefinite elements. I
think this is the principal difference between us. You think all or most
users prefer to control allocation themselves. I'm conviced they don't,
and they'd be really happy not to have to.
You fear loss of efficiency due to copying. Containers are by-reference,
so you must be referring to copying of elements. But doesn't that happen
just exactly when it has to, be it in the library or in the user code?
Assuming a well designed library, one which moves only references, not
the things, as you yourself notice. I've done proof-of-concept
implementations of this for alternative 2. The process and associated
discussion with Matt was recorded on the ASCLWG forum. The code is still
online I think, but needs cleansing.
****************************************************************
From: Jeffrey Carter
Sent: Monday, February 9, 2004 1:01 AM
Randy Brukardt wrote:
>
> Huh? You've said, in effect, that the performance isn't good enough
> for applications where the performance doesn't matter. That's a
> pretty goofy statement!
Actually, you originally said something like that. You have said
1. That the vector component should only be used by applications where
performance doesn't matter.
2. That the difference in performance between possible implementations
of vector may be critical to applications that use it.
If performance doesn't matter to these applications, then the
restriction on implementations should be removed. However, I agree with
you that even applications that are suitable for the use of standard
components may find the performance difference between different
implementations critical.
> The problem I see is a lot of people are looking far too closely at
> tiny pieces of abstractions. You might have a queue or a list as
> part of a large abstraction, but they're pretty much useless by
> themselves. And given that creating a queue or stack (both of which
> have only two operations, both trivial!) would take 3 minutes max, it
> makes no sense to use a complex (and necessarily slow) container
> library for just that -- indeed, it probably would be more work to
> use a container than the 3 minutes.
I have seen a number of these "3-min" structures, and many of them have
subtle errors. These are not beginner mistakes, either; handling dynamic
structures seems to be something that a segment of developers have
difficulty understanding. That these structures are not as easy to
implement as they seem is part of the reason why I think a list
component should be part of a standard library.
Regarding Size and Resize, you wrote:
> That's no different than many of the attributes in Ada, which (if set),
> always return the values that they were set to. But what the compiler does
> with those values is (almost) completely implementation-defined.
There is a difference between a compiler directive and an operation of a
package. The latter must have well defined behavior that is not
implementation defined.
> Huh? Resize tells the container a reasonable size to use; what the container
> does with that information is up to it. Size simply returns that
> information.
What does Size return if Resize has not been called?
This description does not agree with the specification in the proposal.
Size "Returns the length of the internal array." Clearly the
implementation must have something that has a length, independent of the
logical length of the value stored in the vector, for Size to return.
Resize "allocates a new internal array whose length is at least the
value Size". Clearly the implemention must allocate a new something with
a new length. What the container does with the new size is not up to it;
it is specified fairly clearly.
The operations, as specified, are pretty meaningless except for an array
implementation.
If the intention is as you described, then the operations appear to be
useless, and should be eliminated. If the intention is as specified,
then these operations are too tied to the implementation, and should be
eliminated.
> I much prefer the vision of this containers library, where the only
> containers included are those that are large, complex, multi-purpose,
> and have a clear abstraction.
The vision I see seems to be muddied. The containers are poorly named,
poorly specified, and confuse abstractions with their implementations.
My intention is to help assure that Ada has as good a container library
as possible in the time available. I assume that the purpose of
presenting the proposal to the Ada-Comment list is to attract comments
on how it could be improved, and there is time to make such comments and
have them considered. I have invested most of this weekend in describing
specific ways I think they could be improved. In many cases I have
provided concrete suggestions for alternative wording, which I
present here. I hope the result will be useful to the committee.
I have already presented my thoughts on changing the type names used to
be consistent with the rest of the standard. I will use the type names
from the proposal here, however, to avoid confusion.
Vectors
The introductory text to Vectors does not make it clear that this is an
extensible array (EA). After reading the package spec, I initially
thought this was a list, perhaps with an unusual implementation. I doubt
if I am special, so I expect such an interpretation from many readers.
After reading the entire section, I encountered the Implementation
Advice that a vector is similar to an array and realized that this was
an EA. An EA is a useful component that I will be happy to see in the
standard.
However, I think it is a disservice to Ada for readers to have to read
the entire section to know what they're looking at. Borrowing from the
introductory text for Strings.Unbounded, which is a special case of an
extensible array, I suggest something along the lines of: "An object of
type Vector_Type represents an array, indexed by Index_Type with
components of Element_Type, whose low bound is Index_Type'First and
whose length can vary conceptually between 0 and the number of values in
Index_type."
The wording used by Strings.Unbounded should serve as a guide to how to
word the text here. Operations in Strings.Unbounded are defined by
analogy to String operations; operations in Vectors should be defined by
analogy to array operations.
Even with such wording changes, however, it is still going to be
difficult for the reader to find what he wants. Someone looking for
vectors is going to be disappointed to find EAs, and someone looking for
an EA is unlikely to look at something named Vectors. Ada should be able
to do better than that. Extensible_Arrays, Flexible_Arrays, and
Unbounded_Arrays have already been suggested by various people here;
given that we already have Unbounded_Strings, Unbounded_Arrays may be
the best choice.
I am not the first to note that Annex A is one of the most accessible
part of the ARM, and is frequently read by those using the standard
library. It makes sense to recognize this and word these sections as a
users' guide where possible. So, if the ARM gains a mathematical library
of matrices and vectors, we should add to it a comment that those
looking for the kind of vector provided by the STL of C++ or Java's
library should look at package Ada.Containers.Unbounded_Arrays (A.17.2).
In the introductory text to the section, we should mention that an
Unbounded_Array is equivalent to the container called Vector in the STL
of C++ or Java's library (similar to the comment about pointers in 3.10).
Index_Subtype is never used, so it should be eliminated.
Size and Resize were discussed above.
First (Vector) is always Index_Type'First, so it should be a constant.
We iterate over an array A by
for I in A'range loop
-- use A (I)
end loop;
By analogy, we should iterate over an EA by
for I in First .. Last (EA) loop
-- use Element and Replace_Element at I
end loop;
Front and Back, therefore, seem to be unnecessary, and may be deleted.
This has the additional advantage that it eliminates concern about
Index_Type'Base needing a greater range than Index_type, and we could
remove the assertion.
Writing prematurely when I thought this was a list, I suggested an
iterator for vectors. I retract that suggestion.
It could be useful to provide an operation to add an item at an index >
Index_Type'Succ (Last (Vector) ) without assigning to the intervening
positions. The component doesn't currently allow this. Possible wording:
procedure Append (Vector : in out Vector_Type;
Index : in Index_Type;
New_Item : in Element_Type);
If Index <= Last (Vector), this procedure has the same effect as
Replace_Element (Vector, Index, New_Item).
Otherwise, the length of Vector is extended so that Last (Vector) =
Index, and New_Item is assigned to the element at Index. No value is
assigned to the elements at the new positions with indices in
Index_Type'Succ (Last (Vector) ) .. Index_Type'Pred (Index).
There should be some way to indicate that this last use of "Last
(Vector)" refers to the value before the call. I don't see an easy way
to do that and welcome suggestions.
This leaves the problem that Natural is used for the length of a vector
and the counts of inserted or deleted elements, meaning that index types
with more values than Natural cannot use some index values. This is
avoided in Ada.Text_IO, for example, with a type specific for that purpose.
However, this is really a general problem, and a general solution might
be advisable. There are no predefined modular types in Standard, so we
might want to add
type Maximal_Count is mod implementation-defined;
Maximal_Count'Modulus is the largest power of 2 that may be used as the
modulus of a modular type.
We could add a note that this means Maximal_Count'Modulus =
System.Max_Binary_Modulus, for clarity. I presume it would be
inappropriate to reference System in Standard.
If that's not acceptable, we could add somewhere in the hierarchy,
perhaps in package Ada itself
type Maximal_Count is mod System.Max_Binary_Modulus;
[Would we also like subtype Positive_Maximal_Count?]
New packages could then use Maximal_Count rather than Natural for this
sort of thing. Existing packages could be augmented with parallel
operations that use Maximal_Count.
Maps
Maps is fairly well specified. I think the introductory wording should
again be modified: "The user can insert key/value pairs into a map, and
then search for and delete values by specifying the key. An object of
type Map_Type allows searching for a key in less than linear time."
This is a hashed map and specifies an implementation based on a hash
table. This is appropriate, since a hashed map requires the user to
provide a hash function that is not needed by other implementations.
However, I think the name should reflect this (Hashed_Maps) so that we
don't unnecessarily restrain the existence of other forms of Maps.
Since the exact nature of the underlying hash table is implementation
defined, the user doesn't have the information needed to choose an
appropriate size for it. Size and Resize therefore seem inappropriate. I
can hope that users will realize they lack the information to use them
meaningfully, and never call them.
The initial text after the spec seems unnecessarily restrictive of the
implementation. Since the implementation knows best the details of the
hash table, it should determine the initial size of the table.
I agree with the open issue on Swap. I see little use for this operation
on any of the components.
It seems inappropriate to require Insert to resize the hash table. The
implementation should know best when and how to resize the table.
While it's appropriate to discuss nodes as containers of key/value
pairs, it unnecessarily restricts the implementation to talk of nodes
being allocated and deallocated. It should be adequate to say such
things as "Insert adds a new node, initialized to Key and New_Item, to
Map" and "Delete deletes the node from Map".
I don't understand why the string-keyed maps exist, since they are
equivalent to a map with an unbounded string key. The implementation
would have to store the provided key in an appropriate unbounded string,
or duplicate the functionality of unbounded strings. Duplicated
functionality is a bad idea. Moving the conversions to and from
unbounded strings into a special component doesn't seem worth the added
complexity.
Sorted_Sets
The wording here is similar to that for vectors. The introductory text
does not describe the abstraction that the package implements.
Proceeding to the package spec, the reader will probably be puzzled by
the lack of basic set operations such as union and intersection. The
description of the operations that follows does nothing to alleviate the
confusion. A newcomer to the language may very well wonder what's wrong
with these Ada people. Only at the very end of section do we discover
that this is a structure that provides searching in O(log N) time.
Clearly the choice of Set as the name is confusing and misleading, but
I'm not sure what to suggest as an alternative. Something like
Fast_Search seems to imply that it is an algorithm, not a structure.
Perhaps Sorted_Searchable_Structure would work, but I'm not very happy
with it. Suggestions are welcome.
The introductory text needs to identify what the component is: "An
object of type Searchable_Structure represents a data structure that can
be searched in less than linear time."
Given that this is a searchable structure, the operations seem reasonable.
The descriptions of the operations clearly require an implementation
that performs dynamic allocation and deallocation. This is an
unnecessary constraint on the implementation. A binary search is O(log
N), but is not allowed by the current specification. These descriptions
should be modified along similar lines to the suggestions for maps.
If the package does not use "=" for elements, why does it import it? Why
doesn't the package use "="? It's not clear why it should use
"equivalence" rather then equality.
The package Ger_Keys turns a searchable structure into a map. A
searchable structure is a common implementation of a map. Providing an
alternive implementation of a map seems is fine, provided that the name
indicates that it is a map. Sorted_Map might be a better name.
It's quite easy to implement a map with a searchable structure
component, so it would be better if the map was another component at the
same level as the hashed map. I would have no objection to the standard
specifying that this map be implemented with an instantiation of the
searchable structure component; it would make the specification of the
map easy. The primary justifications for this change are that it allows
the user who wants a map based on a searchable structure to obtain it
with a single instantiation, rather than the two required as it stands,
and it allows both maps to have similar interfaces, which they do not
have with the existing proposal.
I'm glad the proposal recognizes that both searchable structures and
maps based on them are useful components, even if they go to great
efforts to disguise what they are.
This discussion of the searchable structure and the map based on it
seems to indicate a basic design problem with the hashed map component.
A hash table is not trivial to implement correctly. There are uses for
hash tables other than maps. As it stands, the user who wants a hash
table must create one, duplicating the effort performed for the map, and
increasing the likelihood of errors.
Just as both a searchable structure and a map based on it are desirable,
so both a hash table and a map based on it would be a good idea. The
user who requires a hash table but not a map could use one that has been
tested by many users, reducing both effort and likelihood of errors.
Thus I suggest that the hash table be turned into a component. As with
the map based on a searchable structure, I would have no problem with
the standard specifying that the hashed map be implemented using the
hash table component.
If we can only have one of the hash table or the hashed map components,
I would argue for the hash table, since it is easy to implement a map
given a hash table, but difficult to implement a hash table given a map.
Providing maps based on other packages allows the standard to
demonstrate a layered approach to creating abstractions. Since creating
useful abstractions is a basic process in software engineering, perhaps
the idea might rub off on some readers.
If this suggestion is accepted, the library would increase from three to
five: an extensible array, a hash table, a searchable structure, a map
based on the hash table, and a map based on the searchable structure.
That still seems a fairly minimal library, provides the same
functionality as the proposal, and adds some additional useful
functionality without significant extra effort.
****************************************************************
From: Martin Krischik
Sent: Monday, February 9, 2004 5:40 AM
> And, on most implementations, I would expect it to make it *many* times
> slower. (It wouldn't have any effect on Janus/Ada, I don't think, because we
> already have to allocate an element at a time anyway.) I would guess that it
> is that efficiency concern that Matt is responding to. But I'll let him
> respond himself...
Actualy some operation will become faster. Like instet in the midle. Also
append operation which need to extend internal storrage become faster.
At least when the stored data is larger then an access - which should be
80% of the cases.
****************************************************************
From: Matthew Heaney
Sent: Monday, February 9, 2004 8:39 AM
Randy Brukardt wrote:
> Huh? Resize tells the container a reasonable size to use; what the container
> does with that information is up to it. Size simply returns that
> information.
It returns the value chosen by the implementation, which can be at least
the size specified.
> The only real requirement here is O(1) element access (which prevents the
> use of a straight linked list).
Yes, that is correct: you cannot use a linked list to implement a vector.
Indeed, if a vector container were implemented as a linked list then it
wouldn't be named "vector"; it would be named "linked list" instead.
My original proposal had 3 kinds of sequence containers: vectors,
deques, and (linked) lists. There were 3 because each has different
time and space properties.
I would have liked having a list container in the final committee
report, since that's the most natural container for use as a queue. (I
probably use lists more often than any other container, for exactly that
reason.) But the size of the proposal had to be reduced somehow.
> Janus/Ada will probably use an array of pointers (or possibly array of
> arrays of pointers); we're going to be (implicitly) allocating the elements
> anyway, we might as well do it explicitly and take advantage of that to make
> Insert/Delete/Sort (and any expansions) much cheaper (presuming the elements
> are bigger than scalar types). An array of arrays of pointers is even
> better, because insertion cost is bounded by the maximum size of an array
> chunk -- but there is more overhead and complexity, so I'd like to see some
> real uses before deciding on an implementation.
My reference implementation just uses an unbounded array internally. It
sounds like you have some other implementation ideas.
I have the maps done, and I'll host the new reference implementation
this morning (Mon, 9 Feb).
> Note that a pure list component has no real opportunity for "better"
> implementations, and indeed, any implementation on Janus/Ada would suffer
> from "double" allocation.
But a list component has O(1) insertion and deletion at any position. A
vector is O(1) only at the back end.
****************************************************************
From: Matthew Heaney
Sent: Monday, February 9, 2004 9:09 AM
Martin Dowie wrote:
>>The only sequence container in the proposal is a vector, which doesn't
>>have a passive iterator. Again, I recommend just using a loop:
>
> I suspect the first thing I will do is add an extra child generic subprogram
> Ada.Containers.Vectors.Iterate! :-)
You might not have to. Since there seems to be interest, I added the
following two declarations to the reference implementation:
generic
with procedure Process
(Element : in Element_Type) is <>;
procedure Generic_Constant_Iteration
(Vector : in Vector_Type);
generic
with procedure Process
(Element : in out Element_Type) is <>;
procedure Generic_Iteration
(Vector : in Vector_Type);
The latest version of the reference implementation is available at my
home page:
<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040209.zip>
****************************************************************
From: Matthew Heaney
Sent: Monday, February 9, 2004 9:14 AM
Martin Krischik wrote:
>>The user can easily code a queue in terms of a Vector (that's one of the
>>uses of Insert!). We dropped the list component because it had an identical
>>interface to the Vector component, but was less flexible (no computed O(1)
>>access).
>
> True enough. But if you wanted a build generic queue on top of the vector the
> tag should not be hidden from view. Otherwise one need to repeat all the
> access methods instead of just renaming the one provided from the parent
> package.
>
> In fact the hidden tag is the one feature which I realey dislike in charles.
You mean the type tag? The components are tagged because I needed
controlledness for automatic memory management. They are tagged for no
other reason, and Charles is specifically designed using static, not
dynamic, polymorphism.
For the record I don't think it's realistic to use a vector as a queue
anyway, since deletion from the front end of a vector is O(n).
****************************************************************
From: Matthew Heaney
Sent: Monday, February 9, 2004 9:22 AM
Martin Krischik wrote:
> Passive Iterators should allways provide the fastes mean to iterate over the
> whole container. They should do so by knowing the internals of the container.
That is correct. A passive iterator will usually beat an active
iterator. But for a vector it probably doesn't make any difference.
However, the latest reference implementation does have passive iterators
for the vector, that look like this:
generic
with procedure Process
(Element : in Element_Type) is <>;
procedure Generic_Constant_Iteration
(Vector : in Vector_Type);
generic
with procedure Process
(Element : in out Element_Type) is <>;
procedure Generic_Iteration
(Vector : in Vector_Type);
The latest version of the reference implementation is available at my
home page:
<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040209.zip>
> Of course it only matters in advanced container with B-Trees or AVL-Trees as
> as internal structure. But I have only seen those in IBM's Open Class Library
> (which is far better the the STL).
>
> But there are no advanced containers in AI 302.
The sorted set is implemented using a balanced tree. The reference
implementation uses a red-black tree, but I suppose an AVL tree would
work too.
The maps are implemented using a hash table.
****************************************************************
From: Matthew Heaney
Sent: Monday, February 9, 2004 9:30 AM
Stephen Leake wrote:
> What is the rationale for making the Map Key_Type definite, as opposed
> to indefinite? Since an indefinite Key_Type is required for
> Containers.Maps.Strings, why not make that capability available to the
> users?
Because that would punish users that have definite key types.
Also, type String isn't just any indefinite type. It's an array.
The reference implementation for String_Maps looks like this:
type Node_Type;
type Node_Access is access Node_Type;
type Node_Type (Key_Length : Natural) is
record
Key : String (1 .. Key_Length);
Element : aliased Element_Type;
Next : Node_Access;
end record;
> I don't see a discussion of this in AI-302-03/01.
There is a paragraph in there explaining why we have a dedicated maps
whose key type is String.
> Another point: Containers.Vectors.Size should return Index_Type'Base,
> and the Size parameter in Resize should also be Index_Type'Base. It's
> confusing to have different types for Size and Index.
No. The parameter of the Resize operation specifies a hint about the
future length of the container, which is subtype Natural.
> There's also a problem if Natural'Last < Index_Type'Last; you
> can't have a vector that contains every index!
The assumption is that a container will always have fewer the
Integer'Last number of elements. (On a 32 bit machine that's 4.2
billion values...)
****************************************************************
From: Matthew Heaney
Sent: Monday, February 9, 2004 9:34 AM
Randy Brukardt wrote:
> We definitely expect that the strings container will use a purpose-built
> data structure for storing strings, not some general indefinite item
> capability. Ways to compactly and efficiently store sets of varying size
> strings are well known and commonly used.
I didn't do anything special here. The internal node declaration for
String_Maps looks like this:
type Node_Type;
type Node_Access is access Node_Type;
type Node_Type (Key_Length : Natural) is
record
Key : String (1 .. Key_Length);
Element : aliased Element_Type;
Next : Node_Access;
end record;
I have hosted the latest version of the reference implementation at my
home page:
<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040209.zip>
****************************************************************
From: Matthew Heaney
Sent: Monday, February 9, 2004 9:49 AM
Randy Brukardt wrote:
>>There's also a problem if Natural'Last < Index_Type'Last; you
>>can't have a vector that contains every index!
>
> Yes, that's a serious problem on Janus/Ada (Integer is 16-bit). However, you
> want the Size and Resize operations to take a numeric type that contains
> zero -- and certainly Index_Type is not that. Index_Type could be a subtype
> of an enumeration type or a subtype of a modular type (neither of which can
> contain zero) or a subtype of an integer type not containing zero.
>
> We had a short, inconclusive discussion about whether the index type ought
> to be range <> rather than (<>) (because enumeration and modular types fail
> the assertion and thus aren't directly usable), but that still doesn't
> guarantee a zero. Moreover, if the integer type has negative numbers, then
> the Length of the vector could be larger than Index_Type'Last.
Clearly, if the container is empty, and Index_Type'Base'First =
Index_Type'First, then evaluation of function Last will raise
Constraint_Error.
The issue is whether elaboration of a vector container object can raise
CE if the Index_Type'Base'First = Index_Type'First.
There's no reason why we should punish users whose generic actual index
subtype has Index_Type'Base'First = Index_Type'First, since they can
always defend against CE like this:
if not Is_Empty (V) and then Last (V) = X then
In fact my reference implementation doesn't require that
Index_Type'Base'First < Index_Type'First, so the assertion in the spec
is somewhat spurious.
I would prefer to weaken the precondition and allow
Index_Type'Base'First = Index_Type'First, but it's really up to
implementors, because allowing that condition will constrain
implementation choices.
> So I don't see a great solution. I wondered about using "Hash_Type" here (it
> has the correct properties), but that seems like a misuse of the type (and a
> bad idea in a library that most Ada programmers will read - you want to show
> them good style in standard libraries).
As I mentioned in my previous message, Resize specifies a hint about the
future number of elements in --that is, the length of-- the container.
My assumption is that no container will ever have more than Integer'Last
number of elements.
If that assumption is incorrect, then maybe the container can be allowed
to grow internally to more than Integer'Last number of elements, but can
only report a maximum value of Integer'Last.
Subtype Natural is the correct choice for the vector Resize operation.
I think the ARG wants to use Hash_Type for Resize for the maps. My
reference implementation still uses Natural.
****************************************************************
From: Robert A. Duff
Sent: Monday, February 9, 2004 4:40 PM
> Clearly, if the container is empty, and Index_Type'Base'First =
> Index_Type'First, then evaluation of function Last will raise
> Constraint_Error.
Well, some might think it's clear, but some might think Last returns
First-1, which for a modular type is 'Last. I'm in favor of making the
Index_Type be "range <>", and also requiring that elaboration of an
instance raise an exception if 'First = 'Base'First. That would avoid
all these anomalies.
****************************************************************
From: Matthew Heaney
Sent: Monday, February 9, 2004 6:24 PM
That seems reasonable. It was questionable whether we really needed
type Index_Type is (<>);
so maybe these issues will require that
type Index_Type is range (<>);
This is probably good enough.
****************************************************************
From: Matthew Heaney
Sent: Monday, February 9, 2004 9:53 AM
Randy Brukardt wrote:
> So, a passive iterator will only be faster in complex containers (where you
> have to separate the Element and Successor functions). For a Vector (where
> the language already has the needed iteration mechanism built-in), it's
> going to be slower (or, if you're really lucky, the same speed) and it
> certainly is a lot harder to write.
>
> So I think having it on Vector would simply be for consistency; you'd never
> actually use it if you know you're dealing with a Vector.
As I mentioned in one of my previous messages, the reference
implementation now has a passive iterator like this:
generic
with procedure Process
(Element : in Element_Type) is <>;
procedure Generic_Constant_Iteration
(Vector : in Vector_Type);
generic
with procedure Process
(Element : in out Element_Type) is <>;
procedure Generic_Iteration
(Vector : in Vector_Type);
There seems to be interest in a passive iterators for vectors, so we
might as well include it.
<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040209.zip>
****************************************************************
From: Matthew Heaney
Sent: Monday, February 9, 2004 10:00 AM
Randy Brukardt wrote:
> At which point, you *equal* the performance of the active iterator. And only
> if *everything* goes right. The OP claimed that the passive iterator would
> always have better performance, and that's certainly not true for the vector
> container. I doubt that it would be true for the Map container, either. It
> could be true for a complex container, but those aren't commonly used.
The vector is arguably a borderline case, but we should just include a
passive iterator. The latest version of the reference implementation
has them for vectors, too.
<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040209.zip>
For both a (hashed) map and (sorted) set, a passive iterator is likely
to beat an active iterator (other things being equal, of course).
For a map, the reason is that you can just use a loop internally, to
keep track of which bucket you're visiting. In an active iterator, you
have to compute the hash value again to find the next bucket.
****************************************************************
From: Matthew Heaney
Sent: Monday, February 9, 2004 10:15 AM
>I suspect the first thing I will do is add an extra child generic
>subprogram Ada.Containers.Vectors.Iterate! :-)
This probably won't be necessary. I added passive iterators to the
vector reference implementation.
<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040209.zip>
****************************************************************
From: Matthew Heaney
Sent: Monday, February 9, 2004 10:13 AM
> And, on most implementations, I would expect it to make it *many* times
> slower. (It wouldn't have any effect on Janus/Ada, I don't think, because we
> already have to allocate an element at a time anyway.) I would guess that it
> is that efficiency concern that Matt is responding to. But I'll let him
> respond himself...
The reason is that (in what I imagine is a typical implementation)
allowing the key to be indefinite would have drastic performance
implications.
The internal node of the map reference implementation looks like this:
type Node_Type;
type Node_Access is access Node_Type;
type Node_Type is
record
Key : aliased Key_Type;
Element : aliased Element_Type;
Next : Node_Access;
end record;
I can declare the key as a record component directly, because the formal
key type is definite. Were we to allow indefinite key types, then we
would have to do something like:
type Node_Type;
type Node_Access is access Node_Type;
type Key_Access is access Key_Type;
type Node_Type is
record
Key : Key_Access;
Element : aliased Element_Type;
Next : Node_Access;
end record;
which implies allocating the key object separately from allocation of
the node itself. This would unfairly punish users that have a definite
actual key type (as Integer or whatever).
<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040209.zip>
If you want an indefinite key type, then allocate the key object
yourself and instantiate the component using the key access type. This
shouldn't be a problem since the map object is typically part of some
higher-level abstraction anyway, so you can hide the allocation and map
manipulation from the users of that higher-level abstraction.
See the !examples section of the proposal for more details.
****************************************************************
From: Simon J. Wright
Sent: Monday, February 9, 2004 11:37 AM
> The internal node of the map reference implementation looks like this:
Does the aliasing of Element carry any implications for Element_Type?
I am thinking of the use of discriminated types, even with defaulted
discriminants, where aliasing forces the object to be constrained.
****************************************************************
From: Matthew Heaney
Sent: Monday, February 9, 2004 11:48 AM
It means you can't instantiate the container using a
default-discriminated element type.
This is the same problem you have when trying to declare a
default-discriminated record on the heap, or as aliased on the stack.
The solution in all cases is to use a wrapper type sans discriminant,
and instantiate the component using the wrapper type as the element type.
****************************************************************
From: Robert A. Duff
Sent: Monday, February 9, 2004 2:40 PM
This seems like a real issue. Either the AI needs to specify that
default-discriminated record "don't work", as it were, or the
implementation needs to do the record-wrapping.
Tucker and I have run into this issue in our current project (I think I
wrote a container package, and Tucker instantiated it like that!), and it
wasn't entirely obvious what the best solution was.
****************************************************************
From: Gary Dismukes
Sent: Monday, February 9, 2004 2:49 PM
> It means you can't instantiate the container using a
> default-discriminated element type.
Not stated quite right -- you can instantiate the container with
such a type, but it might not work right. You might get mysterious
exceptions propagating out of operations if the implementation
reassigns to an Element component in a node.
> This is the same problem you have when trying to declare a
> default-discriminated record on the heap, or as aliased on the stack.
>
> The solution in all cases is to use a wrapper type sans discriminant,
> and instantiate the component using the wrapper type as the element type.
I think that's not an acceptable answer in this case. These aliased
element components are part of the implementation. The user shouldn't
need to know about them and it's an abstraction violation in my opinion
if the user is forced to wrap his element type. Instead it would seem
that the implementation has to do that wrapping. Ugly, but at least
it keeps the ugliness internal to the container implementation.
****************************************************************
From: Matthew Heaney
Sent: Monday, February 9, 2004 2:57 PM
>>The solution in all cases is to use a wrapper type sans discriminant,
>>and instantiate the component using the wrapper type as the element type.
>
> This seems like a real issue. Either the AI needs to specify that
> default-discriminated record "don't work", as it were, or the
> implementation needs to do the record-wrapping.
The problem is that the element type is aliased. Wrapping it internally
won't work because Generic_Element returns an access object that
designates the element, not the wrapper.
You can't satisfy both conditions simultaneously. Personally I find
in-place modification of elements much more useful than being able to
store (unwrapped) default-descriminated elements.
One compromise solution is to only disallow instantiation of
Generic_Element, rather than the whole package, if the element type has
a default-discriminant. But I don't know whether this is possible
within the language.
****************************************************************
From: Matthew Heaney
Sent: Monday, February 9, 2004 9:31 AM
>>The solution in all cases is to use a wrapper type sans discriminant,
>>and instantiate the component using the wrapper type as the element type.
>
> I think that's not an acceptable answer in this case. These aliased
> element components are part of the implementation. The user shouldn't
> need to know about them and it's an abstraction violation in my opinion
> if the user is forced to wrap his element type. Instead it would seem
> that the implementation has to do that wrapping. Ugly, but at least
> it keeps the ugliness internal to the container implementation.
That won't work. Generic_Element returns an access value that
designates an object of type Element_Type, not the internal wrapper
type. The problem is that objects of (default-discriminated)
Element_Type can't be aliased, so I'm not allowed to say Element'Access.
Perhaps there is some other solution. I'm not really sure...
****************************************************************
From: Gary Dismukes
Sent: Monday, February 9, 2004 3:47 PM
Matt Heaney wrote:
>
> That won't work. Generic_Element returns an access value that
> designates an object of type Element_Type, not the internal wrapper
> type. The problem is that objects of (default-discriminated)
> Element_Type can't be aliased, so I'm not allowed to say Element'Access.
True, that's a problem.
> Perhaps there is some other solution. I'm not really sure...
Another solution is to use 'Address and unchecked conversion
to the access type, and forget the aliased component. This is
starting to look unpleasant though :-(
What we really need is something like Tucker's proposal in AI-363
(eliminating access subtype problems), which would prevent this
pesky aliased problem altogether...
****************************************************************
From: Randy Brukardt
Sent: Monday, February 9, 2004 4:01 PM
Right. And that's still on the table, so there may ultimately be no problem
here for Ada 200Y.
****************************************************************
From: Simon J. Wright
Sent: Tuesday, February 9, 2004 3:16 AM
The Booch Components use Address_To_Access_Conversions for this
precise purpose.
****************************************************************
From: Matthew Heaney
Sent: Monday, February 9, 2004 4:07 PM
Indeed. What I was trying to do with Generic_Element is something
similar to what you have in C++:
{
std::vector<int> v;
v.push_back(42);
int& i = v.back();
++i; // i becomes 43
}
The problem is that we don't have references in Ada. But even so you
can do something like this:
type Integer_Access is access all Integer;
function To_Access is
new Integer_Vectors.Generic_Element (Integer_Access);
declare
V : Integer_Vectors.Vector_Type;
begin
Append (V, New_Item => 42);
declare
I : Integer renames To_Access (V, Last (V)).all;
begin
I := I + 1; -- I becomes 43
end;
end;
This works but the model breaks if the element type has a default
discriminant.
In the case of Integer it is perhaps not necessary to use this
mechanism, but consider if the element of the container is another
container. You need a variable view of the container element in order
to manipulate it.
I wish there some other way, something like:
function Element (V : VT) return Element_Type'Reference;
--in the pseudo vectors pkg
declare
V : Integer_Vectors.Vector_Type;
begin
Append (V, New_Item => 42);
declare
I : Integer renames Element (V, Last (V));
begin
I := I + 1;
end;
end;
Here Element_Type'Reference is some kind of virtual type that is limited
and indefinite. The only thing you're allowed to do with the the value
returned by a function that returns T'Reference is to rename it.
But perhaps the ARG has some other, more elegant technique. Just food
for thought...
****************************************************************
From: Tucker Taft
Sent: Monday, February 9, 2004 5:50 PM
Gary Dismukes wrote:
> ...
> I think that's not an acceptable answer in this case. These aliased
> element components are part of the implementation. The user shouldn't
> need to know about them and it's an abstraction violation in my opinion
> if the user is forced to wrap his element type. Instead it would seem
> that the implementation has to do that wrapping. Ugly, but at least
> it keeps the ugliness internal to the container implementation.
I agree. Just declare a local record type that wraps the
user's type. And/or hope that the AI that solves this
problem gets accepted.
****************************************************************
From: Matthew Heaney
Sent: Monday, February 9, 2004 10:18 AM
Tucker Taft wrote:
> I suggest the use of controlled types if you want implicit
> levels of indirection in the keys or the elements. Having the
> container worry about storage management issues relating to elements
> or keys significantly increases their complexity. We very much
> want these containers to be straightforward to define and use.
> They are definitely not the final answer, but more the initial
> answer -- the 20% that can handle 80% of the problems.
Ahhhh, the voice of reason. This is exactly right.
If you want indefinite key types, then you pay that privilege, by having
to do the memory management of indefinite keys yourself. This is how it
should be.
****************************************************************
From: Martin Krischik
Sent: Monday, February 9, 2004 12:40 PM
But you could not even strore a collection of strings. Ok, there are unbounded
strings. But storing 'Class thats the killer feature. If Ada.Containers can't
do it I am not interested. The will be no 20%/80% split. Its 0% - I won't us
them.
****************************************************************
From: Marius Amado Alves
Sent: Monday, February 9, 2004 12:36 PM
Sounds more like the voice of the Devil, or at least De Sade, to me.
"Want indefinite? Go do memory management!" Too much pointer programming
in your minds, dudes. No doubt from much systems programming in your
resum‚s, but you forget not everybody is a systems programmer. For an
application programmer that 80% figure is just so wrong.
(Matt, "this is exactly right", "this is how it should be"? Assertive is
good but now you're sounding like some God (or Devil). I thought you
were an ateist ;-)
****************************************************************
From: Matthew Heaney
Sent: Monday, February 9, 2004 12:58 PM
Ada is a low-level systems programming language. It gives you the tools
to build higher-level abstractions.
If you need to store elements whose type is indefinite, then you have to
build that abstraction yourself, perhaps using the low-level containers
as a substrate.
As Tucker stated, the containers are the starting point, not the ending
point. Certainly, building the higher-level abstraction is much easier
with the low-level containers than without.
****************************************************************
From: Matthew Heaney
Sent: Monday, February 9, 2004 12:53 PM
> But storing 'Class thats the killer feature. If Ada.Containers can't
> do it I am not interested. The will be no 20%/80% split. Its 0% - I won't
> use them.
The library is designed around the common case, which means definite key
and element types.
If you want to store elements of type T'Class, that you have to use an
access type to instantiate the component, and then do the memory
management of elements yourself.
This is how it should be.
****************************************************************
From: Pascal Obry
Sent: Monday, February 9, 2004 1:15 PM
> Ada is a low-level systems programming language. It gives you the tools
> to build higher-level abstractions.
As you seem to like strong arguments, let me try this:
This is plain wrong :) Ada is not low-level and certainly not a system
programming language. Ada is an high level language without a specific
domain, this is my point of view.
I find really strange that only Vector is being considered for example. It
would be really useful to have queue, list and stack. Now limiting the
containers to definite types is another restrictions...
The idea behind the Ada containers was to have a common set of useful
components for Ada to avoid reinventing the wheel... So the argument
"If you need to store elements whose type is indefinite, then you have to
build that abstraction yourself" sounds boggus to me ;)
****************************************************************
From: Matthew Heaney
Sent: Monday, February 9, 2004 1:26 PM
You can use a vector as a stack. The library doesn't need to provide a
stack directly.
The library does not provide a list. I wish it had a list, but the
subcommittee had to reduce the scope of the library and so list didn't
make the cut.
You can use a list as a queue. The library doesn't need to provide a
queue directly. However, the library doesn't provide a list, so it
doesn't provide a queue either.
Note that if you need a priority queue, you can use the sorted set. The
library doesn't need to provide a priority queue directly.
> The idea behind the Ada containers was to have a common set of useful
> components for Ada to avoid reinventing the wheel... So the argument
> "If you need to store elements whose type is indefinite, then you have to
> build that abstraction yourself" sounds boggus to me ;)
I didn't mean that you have to build the component from scratch. I
meant only that you have to do the memory management of indefinite
elements yourself. The higher-level component that you build can be
implemented using the low-level containers.
Real systems are built from the bottom up. All we did was to provide
the lowest level in the abstraction hierarchy.
****************************************************************
From: Pascal Obry
Sent: Monday, February 9, 2004 2:00 PM
Matthew,
> You can use a vector as a stack. The library doesn't need to provide a
> stack directly.
Except that a stack should have a far more limited set of operations. This
ensure that the stack abstraction is not worked-around.
> The library does not provide a list. I wish it had a list, but the
> subcommittee had to reduce the scope of the library and so list didn't
> make the cut.
I really think that this should be reconsidered. A list is the most used
abstraction in many software I have built/seen.
> You can use a list as a queue.
Of course but again this is wrong in my view. The abstraction should be
constrained to the set of operations for a queue. In that case why not remove
the vector, it can be implemented easily with a map, the key is the index of
the item in the array :)
> Note that if you need a priority queue, you can use the sorted set. The
This is more high level component, I agree that it is ok to not include it.
If we miss some important components in the standard container library what we
will do ? Use another component library like Charles or PragmArc... an not use
the standard container library... so what the point ????
The most important point in a container library is *completeness* I would
say. This is exactly what STL has done.
****************************************************************
From: Martin Krischik
Sent: Monday, February 9, 2004 12:16 PM
> If you want an indefinite key type, then allocate the key object
> yourself and instantiate the component using the key access type. This
> shouldn't be a problem since the map object is typically part of some
> higher-level abstraction anyway, so you can hide the allocation and map
> manipulation from the users of that higher-level abstraction.
But Ada hasn't got a garbage collector so there is the deallocation problem.
Especialy when the container copied or passed around.
And Ada (unlike C++) can to better! With Ada you can have a container with
indefinite types where with C++ you can't. We should not give away that
advantage.
****************************************************************
From: Marius Amado Alves
Sent: Monday, February 9, 2004 1:07 PM
> Ada is a low-level systems programming language. It gives you the
> tools to build higher-level abstractions.
Ok. Thanks for recentring the argument. So your position is that the
standard should not give high-level facilities. Personally I see Ada's
doom in that position. A stillborn Ada 2005.
****************************************************************
From: Pascal Obry
Sent: Monday, February 9, 2004 2:03 PM
Sadly, I feel alike :(
****************************************************************
From: Stephen Leake
Sent: Monday, February 9, 2004 1:56 PM
> If you want indefinite key types, then you pay that privilege, by
> having to do the memory management of indefinite keys yourself. This
> is how it should be.
Ok. I'd like to see that rationale documented in the final version of
the AI, so people understand why Ada.Containers.String_Map isn't
simply an instantiation of Ada.Containers.Map.
One more argument for indefinite keys; if a C++ person looks at this,
they can say "Ada generics are so weak they can't even allow a String
as a key!". Not good for the "let's attract more users" goal.
And I will continue to use SAL, where the containers do the memory
management, because I like that design point better :).
****************************************************************
From: Matthew Heaney
Sent: Monday, February 9, 2004 2:13 PM
Stephen Leake wrote:
> One more argument for indefinite keys; if a C++ person looks at this,
> they can say "Ada generics are so weak they can't even allow a String
> as a key!". Not good for the "let's attract more users" goal.
But you can't do that in C++, either. Indeed, C++ doesn't have
indefinite types so it's unlikely a C++ programmer would even think to
ask that question.
> And I will continue to use SAL, where the containers do the memory
> management, because I like that design point better :).
Real systems are built from the bottom up. All we did was to provide
the lowest-level in the abstraction hierarchy.
****************************************************************
From: Stephen Leake
Sent: Monday, February 9, 2004 4:20 PM
> But you can't do that in C++, either. Indeed, C++ doesn't have
> indefinite types so it's unlikely a C++ programmer would even think to
> ask that question.
Hmm. To be specific;
can a C++ STL Map be instantiated with a C++ STL String as the Key?
I'll have to check, but I bet the answer is "yes".
****************************************************************
From: Matthew Heaney
Sent: Monday, February 9, 2004 6:27 PM
Yes of course an STL map can be instantiated with type std::string as the
key, but that type is analogous to Ada's Unbounded_String, not String.
> I'll have to check, but I bet the answer is "yes".
Yes it can, but you're comparing apples and oranges.
****************************************************************
From: Stephen Leake
Sent: Monday, February 9, 2004 8:36 PM
Ok. And Ada.Containers.Map can be instantiated with Unbounded_String
as the Key. Good enough.
****************************************************************
From: Matthew Heaney
Sent: Monday, February 9, 2004 2:22 PM
Pascal Obry wrote:
> Except that a stack should have a far more limited set of operations. This
> ensure that the stack abstraction is not worked-around.
Fine. Then you can implement that stack abstraction yourself, using a
vector as the implementation.
> I really think that this should be reconsidered. A list is the most used
> abstraction in many software I have built/seen.
I think so too, but the subcommittee had to reduce the scope of the
proposal and so lists didn't make the cut.
If you ask for too much then you might not get anything.
> Of course but again this is wrong in my view. The abstraction should be
> constrained to the set of operations for a queue.
Fine. Then you can implement that queue abstraction yourself, using a
list as the implementation.
>In that case why not remove
> the vector, it can be implemented easily with a map, the key is the index of
> the item in the array :)
That would be an example of "abstraction inversion": using a
higher-level abstraction to implement a more low-level one.
This is the mistake they made in Ada83, requiring that high-level tasks
be used to implement low-level synchronization constructs as semaphores
and monitors.
Ada is a low-level systems programming language. It is not Perl.
> If we miss some important components in the standard container library what we
> will do ? Use another component library like Charles or PragmArc... an not use
> the standard container library... so what the point ????
Do whatever you're doing now.
The intent of the committee is that this small, modest set of containers
will provide the impetus for a secondary standard.
> The most important point in a container library is *completeness* I would
> say. This is exactly what STL has done.
Well, my original proposal included all the containers in the STL and
then some. So don't blame me!
****************************************************************
From: Pascal Obry
Sent: Monday, February 9, 2004 2:52 PM
> Fine. Then you can implement that stack abstraction yourself, using a
> vector as the implementation.
Of course, I also can implement every thing myself :)
> Fine. Then you can implement that queue abstraction yourself, using a
> list as the implementation.
Of course, I also can implement every thing myself :)
> That would be an example of "abstraction inversion": using a
> higher-level abstraction to implement a more low-level one.
As it is to implement a stack over a vector abstraction.
> Ada is a low-level systems programming language. It is not Perl.
It is not Perl, but it is not either a low-level systems programming
language :) And yes I'll keep repeating this :)
> Do whatever you're doing now.
But I don't !!! That's the whole point of the container library.
> The intent of the committee is that this small, modest set of containers
> will provide the impetus for a secondary standard.
Ok. That's a point.
> > The most important point in a container library is *completeness* I would
> > say. This is exactly what STL has done.
>
> Well, my original proposal included all the containers in the STL and
> then some. So don't blame me!
I know Matthew and I want to thanks you for the hard work. I just expected a
bit more so I'm frustrated :)
****************************************************************
From: Matthew Heaney
Sent: Monday, February 9, 2004 2:00 PM
Martin Krischik wrote:
> But Ada hasn't got a garbage collector so there is the deallocation problem.
> Especialy when the container copied or passed around.
You are responsible for memory management of the indefinite elements.
Implement your high-level abstraction using the low-level container,
instantiated with an access type.
> And Ada (unlike C++) can to better! With Ada you can have a container with
> indefinite types where with C++ you can't. We should not give away that
> advantage.
There is only a slight difference here between Ada95 and C++. In Ada95
you can do this:
procedure Insert (C : in out CT; E : in ET) is
EA : constant ET_Access := new ET'(E);
begin
...
This will work even if ET is indefinite.
In C++ the type has to have a clone operator or whatever:
void insert(const e_t& e)
{
e_t* const pe = e.clone();
...
}
Internally the components wouldn't be any different.
****************************************************************
From: Stephen Leake
Sent: Monday, February 9, 2004 2:04 PM
Matthew Heaney <mheaney@on2.com> writes:
> Stephen Leake wrote:
>
> > What is the rationale for making the Map Key_Type definite, as opposed
> > to indefinite? Since an indefinite Key_Type is required for
> > Containers.Maps.Strings, why not make that capability available to the
> > users?
>
> Because that would punish users that have definite key types.
Can you elaborate on this? I don't see it.
> Also, type String isn't just any indefinite type. It's an array.
>
> The reference implementation for String_Maps looks like this:
>
> type Node_Type;
> type Node_Access is access Node_Type;
>
> type Node_Type (Key_Length : Natural) is
> record
> Key : String (1 .. Key_Length);
> Element : aliased Element_Type;
> Next : Node_Access;
> end record;
Obviously you can optimize a container if you know the specific types
involved. But the standard containers aren't supposed to be about
highly optimized code; they are supposed to be about generally useful
code.
> > I don't see a discussion of this in AI-302-03/01.
>
> There is a paragraph in there explaining why we have a dedicated maps
> whose key type is String.
Yes. It does _not_ say why Ada.Containers.Maps.Key_Type is _not_
indefinite. That's what I'd like to see.
> > Another point: Containers.Vectors.Size should return
> > Index_Type'Base, and the Size parameter in Resize should also be
> > Index_Type'Base. It's confusing to have different types for Size
> > and Index.
>
> No. The parameter of the Resize operation specifies a hint about the
> future length of the container, which is subtype Natural.
Why is it Natural? Randy pointed out that Index_Type'Base might not
include 0, or even be an enumeral. I'd rather see Index_Type be
specified as a signed integer, including 0, rather than have Size
return a type that is not Index_Type. (SAL makes this choice).
> > There's also a problem if Natural'Last < Index_Type'Last; you
> > can't have a vector that contains every index!
>
> The assumption is that a container will always have fewer the
> Integer'Last number of elements. (On a 32 bit machine that's 4.2
> billion values...)
And that assumption is precisely the problem. On systems where
Integer'Last is 2**15, you can't have large containers. Ada must not
make such assumptions!
****************************************************************
From: Stephen Leake
Sent: Monday, February 9, 2004 2:14 PM
> The internal node of the map reference implementation looks like this:
Ok. That makes sense. I suggest this level of detail be kept in the
Rationale for the Ada.Containers package.
I address this issue in SAL
(http://www.toadmail.com/~ada_wizard/ada/sal.html) by allowing the
user to specify both the Key_Type and the Key_Node_Type, and provide a
function To_Key_Node to go from one to the other. For definite keys,
the types are the same, and To_Key_Node is an inlined null function,
so there is no overhead. For indefinite keys, that function does the
allocation.
Hm. In shared code generics, I guess the "inlined null function" does
not get optimized away. So perhaps this would not be an appropriate
approach for a standard Ada package.
Actually, in SAL, keys are always stored in the Items, so you'll only
see Item_Type, Key_Type, and Item_Node_Type, not Key_Node_Type. But
the principle is the same.
It is more complex to instantiate SAL containers than the proposed
Ada.Containers.Map. But I would argue that it is worth it.
> If you want an indefinite key type, then allocate the key object
> yourself and instantiate the component using the key access type.
> This shouldn't be a problem since the map object is typically part of
> some higher-level abstraction anyway, so you can hide the allocation
> and map manipulation from the users of that higher-level
> abstraction.
Ok. In SAL, I don't have two layers. And I agree with others who say
that Ada should provide a useful container that does "typical" memory
management tasks for you.
But any container is better than none :).
****************************************************************
From: Alexandre E. Kopilovitch
Sent: Monday, February 9, 2004 3:05 PM
Pascal Obry wrote:
> Ada is not low-level and certainly not a system
> programming language. Ada is an high level language without a specific
> domain, this is my point of view.
Self-contradictory viewpoint, though - because high level language without a
specific domain and low-level system programming language are roughly the same
thing -:)
> The idea behind the Ada containers was to have a common set of useful
> components for Ada to avoid reinventing the wheel... So the argument
> "If you need to store elements whose type is indefinite, then you have to
> build that abstraction yourself" sounds boggus to me ;)
If we call them "containers" then they should, in some substantial sense,
*contain* things, not just refer to them, So, in this case, they should do
all associated memory management. Otherwise, they aren't Containers, they are
Inventories. It is improper name that confuses the matter and creates heated
argument.
Also, it seems that the library is planned without looking at new features
in Ada2005, particularly, interfaces. I think that this (if true) may be a
serious mistake. Interfaces may provide a way for reconciling different
requirements.
****************************************************************
From: Ehud Lamm
Sent: Tuesday, February 10, 2004 1:04 AM
I would be very happy to see an Ada.Container.Interfaces (or
Ada.Container.Signatures) package/hierarchy, specifying APIs, which could
then be used to achieve (static) polymorphism.
I think this is the palce to provide Stack, Queue interfaces etc. as well.
I think that's a good way to encourage the building block approach.
As far as I recall the workshop we had in Vienna (right?), not many shared
my enthusiasm, alas.
****************************************************************
From: Randy Brukardt
Sent: Tuesday, February 10, 2004 6:53 PM
Alexandre E. Kopilovitch wrote:
...
> Also, it seems that the library is planned without looking at new features
> in Ada2005, particularly, interfaces. I think that this (if true) may be a
> serious mistake. Interfaces may provide a way for reconciling different
> requirements.
I wondered how long it would be before someone asked that question.
I did in fact do some (idle) thinking on that question, and I concluded that
interfaces wouldn't be useful for the containers library.
What you'd like is to be able to write interfaces that describe iteration,
for example, and be able to use those without knowing anything about the
underlying container. Similarly, you could have a sequence interface that
worked with any sequence container.
However, that doesn't really work. The primary problem is that the profiles
of the operations of an interface are fixed other than the object itself.
But, for a container, the operations contain a generic formal type (the
element type), as well as the object type. That means that general
interfaces (like the ones described above), for example) can't be written
that would match any possible element type, only a specific element type
(which is pretty useless).
One way to get around that would be to put the interfaces into the generic
units. But then, the interfaces would only be usable with that container --
hardly a useful interface! You might as well just use the container
directly.
A better way would be to make the element type an interface itself. Then you
could write useful non-generic interfaces. But that would limit the
contained objects to types that can have an interface: tagged types, and
perhaps task and protected types (and of course have the required
interface). That sort of limitation isn't going to fly for the primary
container library - a container of access values is just too common and
important. (I could imagine an O-O offshoot that worked at that way - in a
secondary standard.)
****************************************************************
From: Alexandre E. Kopilovitch
Sent: Tuesday, February 10, 2004 9:45 PM
Randy Brukardt wrote:
> I did in fact do some (idle) thinking on that question, and I concluded that
> interfaces wouldn't be useful for the containers library.
>
> What you'd like is to be able to write interfaces that describe iteration,
> for example, and be able to use those without knowing anything about the
> underlying container. Similarly, you could have a sequence interface that
> worked with any sequence container.
Yes.
> However, that doesn't really work. The primary problem is that the profiles
> of the operations of an interface are fixed other than the object itself.
> But, for a container, the operations contain a generic formal type (the
> element type), as well as the object type. That means that general
> interfaces (like the ones described above), for example) can't be written
> that would match any possible element type, only a specific element type
> (which is pretty useful).
This shows an unpleasant incompatilibity of interfaces with generics. Well,
perhaps "incompatibility" is too strong word for that, but anyway there is
some inconsistence, these notions do not collaborate smoothly. And this is
a general issue, regardless of container library.
> One way to get around that would be to put the interfaces into the generic
> units. But then, the interfaces would only be usable with that container --
> hardly a useful interface! You might as well just use the container
> directly.
Yes, this is clearly a poor way.
> A better way would be to make the element type an interface itself. Then you
> could write useful non-generic interfaces. But that would limit the
> contained objects to types that can have an interface: tagged types, and
> perhaps task and protected types (and of course have the required
> interface). That sort of limitation isn't going to fly for the primary
> container library - a container of access values is just too common and
> important.
I don't understand the latter sentence - I thought that access to interfaces
is permitted... I'm looking at the last example in AI-251 (under the line
"A somewhat less artifical example") - there is type Object_Reference, which
is access to interface type Monitored_Object'Class, and this Object_Reference
is used for parameters of procedures Register and Unregister.
And if you meant that those access values may point to untagged types then
I think that "boxing" those untagged types will not significantly annoy a
programmer.
But anyway I don't think that this way is generally better. It artificially
pushes a containter in position of "controlling object", which isn't a good
thing. And it often convolutes thinking... seems no better than typical C++
puzzles, a maintainer's hell.
****************************************************************
From: Randy Brukardt
Sent: Wednesday, February 10, 2004 11:03 PM
> I don't understand the latter sentence - I thought that access to interfaces
> is permitted... I'm looking at the last example in AI-251 (under the line
> "A somewhat less artifical example") - there is type Object_Reference, which
> is access to interface type Monitored_Object'Class, and this Object_Reference
> is used for parameters of procedures Register and Unregister.
Yes, but access types themselves are not tagged. What they point at is
irrelevant. If you have a formal "type T is tagged private;" no access type
will match that; it's the same for interfaces.
You could of course wrap the access type in a tagged record, and give the
interface to that, and then the element type could be that. But then you have
an extra component name in every use, which is annoying.
For lower-level uses, having a vector/sequence of pointers or a map of pointers
certainly sounds useful and common; forcing wrapping is not going to win any
style points.
****************************************************************
From: Robert A. Duff
Sent: Monday, February 9, 2004 4:28 PM
Regarding support for indefinite keys,
Martin Krischik said:
> But you could not even strore a collection of strings. Ok, there are
> unbounded strings. But storing 'Class thats the killer feature. If
> Ada.Containers can't do it I am not interested. The will be no 20%/80%
> split. Its 0% - I won't us them.
How about this: you write a package that supports the indefinite case,
and you build it on top of the (currently proposed) standard package
that supports only definite? The definite-only package takes care of
the hashing or whatever, and your package takes care of memory
management for the indefinite keys.
Maybe you try to get your package to be a de-facto standard, or a
secondary standard.
The point is, you *can* use the definite-only package, but only
indirectly, via a wrapper package. The definite-only package isn't
useless; it does *part* of the job you desire. This seems like a better
design than making a single package that supports both, and somehow
magically optimize the definite cases.
If the RM supports indefinite, I claim it should do so by providing two
separate packages. But we're trying to minimize the size of all this,
so we choose just the lower-level one of those.
Yeah, it would be nice if the RM provided both...
****************************************************************
From: Randy Brukardt
Sent: Monday, February 9, 2004 5:36 PM
These seem like an ideal candidate for the hoped-for containers secondary
standard.
****************************************************************
From: Matthew Heaney
Sent: Monday, February 9, 2004 11:09 AM
Randy Brukardt wrote:
> If we want an array sort, we should declare one:
>
> generic
> type Index_Type is (<>);
> type Element_Type is private;
> function "<" (Left, Right : Element_Type) return Boolean is <>;
> type Array_Type is array (Index_Type) of Element_Type;
> procedure Ada.Generic_Sort (Arr : in out Array_Type);
>
> (We'd need an unconstrained version, too.) But keep it separate from the
> Vector one (or any List one, for that matter).
I added a the following library-level declarations to the latest
reference implementation:
AI302.Containers.Generic_Sort_Constrained_Array
AI302.Containers.Generic_Sort_Unconstrained_Array
AI302.Containers.Generic_Sort
The latter works for any sequence having a random-access iterator, um, I
mean cursor.
<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040209.zip>
They're all basically the same: a simple quicksort using a median-of-3
to choose a pivot.
The Generic_Sort for the vector is implemented as an instantiation of
the generic sort for arrays.
<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040209.zip>
****************************************************************
From: Robert A. Duff
Sent: Sunday, February 8, 2004 12:09 PM
Marius Amado Alves wrote:
> In the meanwhile, there is no requirement that Ada.Containers be
> implemented strictly in Ada, is there?
No. However, there is "meta requirement" that Ada.Containers be
implementABLE in Ada, and I expect all implementations will be in plain
vanilla Ada without compiler-specific tricks.
****************************************************************
From: Matthew Heaney
Sent: Monday, February 9, 2004 1:41 PM
The proposal can be implemented in Ada today. In fact it already is:
<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040209b.zip>
****************************************************************
From: Ehud Lamm
Sent: Tuesday, February 10, 2004 12:58 AM
I agree. I think the meta requirement is the way to go. If there is some good
reason to resort to non-Ada code, it should be allowed, so long as the API is
maintained. BUT, it would reflect badly on the languiage if the only way to
implement this sort of library efficiently would require going outside the
scope of the language. Remember Ada is a general purporse, reuse oriented
language.
One of the reasons I wanted this discussion (and I pushed for a standard
container library back when practically no one wanted to hear...) is that I
think that by working on standard libraries it is easier to focus on areas
where the language needs improvement.
I think this is in fact what's happening right now...
****************************************************************
From: Robert A. Duff
Sent: Monday, February 9, 2004 2:37 PM
Right, and my point was that I want to keep it that way.
I suggest the AI mention this "meta requirement" in its discussion.
Some folks have suggested some sort of compiler-specific "magic" going
on behind the scenes. I don't want that.
>... In fact it already is:
>
> <http://home.earthlink.net/~matthewjheaney/charles/ai302-20040209b.zip>
I thank you for your hard work on this. I haven't had a chance to look
at it yet, though. What sort of copyright does it have? Can the
various implementers just take your code and use it as their
implementation of this AI?
****************************************************************
From: Matthew Heaney
Sent: Monday, February 9, 2004 2:46 PM
Yes. That was the intent.
We can attach any copyright necessary to allow implementors or anyone
else to use it.
Will the GMGPL work? I'm not an expert on these matters.
****************************************************************
From: Robert A. Duff
Sent: Monday, February 9, 2004 4:36 PM
I suspect the GMGPL would work, but I'm not an expert on these matters,
either. I suggest you ask Robert Dewar.
****************************************************************
From: Pascal Leroy
Sent: Monday, February 9, 2004 10:41 AM
> I've just posted the report of the containers committee on
> Ada-Comment. The executive summary follows. You can read the
> whole report in the !appendix to AI-00302-3/01, which you can
> find at: http://www.ada-auth.org/cgi-bin/cvsweb.cgi/AIs/AI-20302.TXT
> or you can download the ZIP or tar files from:
> http://www.ada-auth.org/ais.html
Good job. A few comments after a first perusal:
1 - Insisting on O(N log N) complexity for the sorting algorithm
excludes Shellsort. This is misguided in my opinion, as Shellsort often
behaves better in practice that Quicksort (in particular, if the input
file is nearly in order).
2 - I would really like it if the definition of containers were written
without a particular implementation in mind. It's OK to explain that a
Vector is logically an array, but _requiring_ that insertion at the
beginning should take time O(N) is nonsensical! This is preventing
possibly better implementations. I have also seen in a mail by Randy
that element access has to be in O(1) (somehow I can't find this in the
AI). Again, I believe that this is overspecification. A skip list
would be in my opinion a perfectly good implementation of a Vector, as
in most practical situations the difference between O(1) and O(Log N)
doesn't matter. But the O(1) requirement precludes a skip list
implementation...
3 - Similarly, I don't understand why the definition of Maps insists on
a hash-based implementation. I have no problem with the notion that
this generic takes a hash-function, as this can be generally useful
whatever the implementation strategy. But I don't see why it's
necessary to insist on or expose the details of a hash-based
implementation. For large maps, a tree-based implementation makes
probably more sense. We should not prevent such an implementation.
Furthermore, the description seems to require a hash-based
implementation that tries to keep the collision lists reasonably short
(by increasing the number of buckets) and that can lead to very
expensive deallocation/reallocation.
4 - Like others, I don't like the type names ending in _Type (but I
realize that's a matter of taste). More seriously, I don't like the
usage of the word Vector, as this word is already used by AI 296. Since
it might make perfect sense to have a vector-302 of vectors-296 (e.g.
successive positions of a mobile) the terminology is only going to cause
confusion among users. Of all the proposals that I have seen, Sequence
has my preference. And I don't give a damn what the terminology is in
Java or C++.
****************************************************************
From: Robert Dewar
Sent: Monday, February 9, 2004 11:02 AM
> 1 - Insisting on O(N log N) complexity for the sorting algorithm
> excludes Shellsort. This is misguided in my opinion, as Shellsort often
> behaves better in practice that Quicksort (in particular, if the input
> file is nearly in order).
Or what about linear sorts like address calculation :-)
> 2 - I would really like it if the definition of containers were written
> without a particular implementation in mind. It's OK to explain that a
> Vector is logically an array, but _requiring_ that insertion at the
> beginning should take time O(N) is nonsensical! This is preventing
> possibly better implementations. I have also seen in a mail by Randy
> that element access has to be in O(1) (somehow I can't find this in the
> AI). Again, I believe that this is overspecification. A skip list
> would be in my opinion a perfectly good implementation of a Vector, as
> in most practical situations the difference between O(1) and O(Log N)
> doesn't matter. But the O(1) requirement precludes a skip list
> implementation...
I agree this is over specified. Also, O(1) is a bit bogus given caches
anyway.
> 3 - Similarly, I don't understand why the definition of Maps insists on
> a hash-based implementation. I have no problem with the notion that
> this generic takes a hash-function, as this can be generally useful
> whatever the implementation strategy. But I don't see why it's
> necessary to insist on or expose the details of a hash-based
> implementation. For large maps, a tree-based implementation makes
> probably more sense. We should not prevent such an implementation.
> Furthermore, the description seems to require a hash-based
> implementation that tries to keep the collision lists reasonably short
> (by increasing the number of buckets) and that can lead to very
> expensive deallocation/reallocation.
I agree with Pascal here entirely
> 4 - Like others, I don't like the type names ending in _Type (but I
> realize that's a matter of taste). More seriously, I don't like the
> usage of the word Vector, as this word is already used by AI 296. Since
> it might make perfect sense to have a vector-302 of vectors-296 (e.g.
> successive positions of a mobile) the terminology is only going to cause
> confusion among users. Of all the proposals that I have seen, Sequence
> has my preference. And I don't give a damn what the terminology is in
> Java or C++.
I really think the _Type suffix should be avoided, with few exceptions
it is not at all RM style.
****************************************************************
From: Tucker Taft
Sent: Monday, February 9, 2004 12:37 PM
> 1 - Insisting on O(N log N) complexity for the sorting algorithm
> excludes Shellsort. This is misguided in my opinion, as Shellsort often
> behaves better in practice that Quicksort (in particular, if the input
> file is nearly in order).
The complexity specifications are intended to set expectations,
without being overly prescriptive. If there are no shared expectations,
then the containers can end up being frustrating to use.
As usual, the requirement associated with a complexity specification
is that as N => infinity, there is some upper bound on the ratio
between the actual time and the given formula. We should also
make it clear whether this is for the average case, or the worst case.
> 2 - I would really like it if the definition of containers were written
> without a particular implementation in mind. It's OK to explain that a
> Vector is logically an array, but _requiring_ that insertion at the
> beginning should take time O(N) is nonsensical!
Clearly we should say "no worse than O(N)".
> ... This is preventing
> possibly better implementations. I have also seen in a mail by Randy
> that element access has to be in O(1) (somehow I can't find this in the
> AI). Again, I believe that this is overspecification. A skip list
> would be in my opinion a perfectly good implementation of a Vector, as
> in most practical situations the difference between O(1) and O(Log N)
> doesn't matter. But the O(1) requirement precludes a skip list
> implementation...
I am not an expert on skip lists, but it seems critical to appropriate use
that any element of a vector is "directly addressible". Random access
is a fundamental part of the abstraction, and if that is not efficient,
it will be very hard to create applications that work reasonably across
implementations. There needs to be some kind of bound on random
access. If you believe O(Log N) is acceptable, we can consider that.
For a vector, I personally expect O(1), where the constant factor is *very*
small, and the per-component space overhead ratio is no worse than 100%,
even for byte-sized components.
> 3 - Similarly, I don't understand why the definition of Maps insists on
> a hash-based implementation. I have no problem with the notion that
> this generic takes a hash-function, as this can be generally useful
> whatever the implementation strategy. But I don't see why it's
> necessary to insist on or expose the details of a hash-based
> implementation. For large maps, a tree-based implementation makes
> probably more sense.
Why? I would have thought just the opposite. A hashed map can provide
an average case of O(1), and there is nothing precluding using trees
for the few hash buckets that get big.
> ... We should not prevent such an implementation.
> Furthermore, the description seems to require a hash-based
> implementation that tries to keep the collision lists reasonably short
> (by increasing the number of buckets) and that can lead to very
> expensive deallocation/reallocation.
I feel like you are arguing both sides of the coin here. You are objecting
to the behavior while at the same time saying we shouldn't specify it.
If it is clear that this is an abstraction whose performance is no worse
than an extensible hash table, then it is more likely it will be used
appropriately. By doubling on each expansion, the number of reallocations
can be kept relatively small, and the pieces left behind are generally
just the right size for other growing hash tables.
I suppose you could say that you will implement it in a way that makes
your particular customers happy, but I don't think that is a way to
create a standard. The goal is portability, not only in terms
of correct execution, but also in terms of reasonable, relatively
predictable performance.
I agree we shouldn't overspecify, but nor should we underspecify. We
need to specify enough to establish useful, reasonable expectations
for implementors and users, so the container library is not just a
toy, but is actually a useful part of the professional Ada programmer's
toolkit. We certainly should never discourage implementors from
doing better than the minimal requirements, but nor should we encourage
them to deviate so much from the minimal requirements that they have
effectively created a different abstraction, interfering with
portability.
I see the error bounds specified for the elementary functions as a similar
exercise. They establish expectations, which reduces confusion and
frustration, and helps make it clear when the language-defined functions
can be used appropriately, and when they can't.
> 4 - Like others, I don't like the type names ending in _Type (but I
> realize that's a matter of taste). More seriously, I don't like the
> usage of the word Vector, as this word is already used by AI 296. Since
> it might make perfect sense to have a vector-302 of vectors-296 (e.g.
> successive positions of a mobile) the terminology is only going to cause
> confusion among users. Of all the proposals that I have seen, Sequence
> has my preference. And I don't give a damn what the terminology is in
> Java or C++.
Of course you don't give a damn. But the question is whether other users
who do write significant amounts of code in other languages will appreciate
the effort to be part of the mainstream, rather than always trying to
swim in our own creek, elegant and pure as it may be.
****************************************************************
From: Robert Dewar
Sent: Monday, February 9, 2004 1:10 PM
Tucker Taft wrote:
> The complexity specifications are intended to set expectations,
> without being overly prescriptive. If there are no shared expectations,
> then the containers can end up being frustrating to use.
> As usual, the requirement associated with a complexity specification
> is that as N => infinity, there is some upper bound on the ratio
> between the actual time and the given formula. We should also
> make it clear whether this is for the average case, or the worst case.
Big O is not an upper bound, it is a description of asymptotic behavior.
As written, this spec would prohibit a sort whose behavior was
asymptotically linear.
> I am not an expert on skip lists, but it seems critical to appropriate use
> that any element of a vector is "directly addressible". Random access
> is a fundamental part of the abstraction, and if that is not efficient,
> it will be very hard to create applications that work reasonably across
> implementations. There needs to be some kind of bound on random
> access. If you believe O(Log N) is acceptable, we can consider that.
> For a vector, I personally expect O(1), where the constant factor is *very*
> small, and the per-component space overhead ratio is no worse than 100%,
> even for byte-sized components.
What does constant factor mean here? A typical implementation of arrays
will have extreme variable behavior depending on caching. A naive model
in which all access is constant time is unrealistic in any case.
> Why? I would have thought just the opposite. A hashed map can provide
> an average case of O(1), and there is nothing precluding using trees
> for the few hash buckets that get big.
I personally think that any comments about performance should be
implementation advice, not requirements. You will get into all kinds
of formal mess if you try to make them requirements, but as IA they
are fine and comprehensible.
> Of course you don't give a damn. But the question is whether other users
> who do write significant amounts of code in other languages will appreciate
> the effort to be part of the mainstream, rather than always trying to
> swim in our own creek, elegant and pure as it may be.
To me, sequence *is* more mainstream than vector. The latter phrase
comes with far too much baggage :-)
****************************************************************
From: Stephane Barbey
Sent: Monday, February 9, 2004 1:54 PM
Both IDL and UML (OCL) use "Sequence" for unbounded collections
of ordered elements that allow the same element more than once.
OCL offers Set, Bag, Sequence and Collection.
The Ada mapping to IDL offers a Corba.Sequences.Unbounded
(and Bounded) package that are similar in spirit (and in
specification) to what the Ada.Strings.Bounded and
Unbounded packages provide.
****************************************************************
From: Randy Brukardt
Sent: Monday, February 9, 2004 2:07 PM
> I personally think that any comments about performance should be
> implementation advice, not requirements. You will get into all kinds
> of formal mess if you try to make them requirements, but as IA they
> are fine and comprehensible.
All of the performance "requirements" *are* written as Implementation
Advice. There isn't any way that I can think of to make them normative, and
in any case, that would be overspecification.
So, if Pascal wants to ignore them, he can -- he just has to document that
fact.
****************************************************************
From: Robert Dewar
Sent: Monday, February 9, 2004 3:23 PM
OK, sorry, missed this, then I have no objection to any of the
statements, though there is still a bit of over-specification
I would say :-)
****************************************************************
From: Robert A. Duff
Sent: Monday, February 9, 2004 4:27 PM
By the way, I find this discussion somewhat frustrating, because there
are discussions going on in ada-comment, and also on arg. People are
raising some of the same points on both. It seems like the ARG should
pay a lot of attention to real users on this issue, but I fear some key
ARG members are not currently listening to ada-comment, and many
ada-comment folks are not seeing the arg mailing list.
Sigh.
Anyway, Pascal Leroy said:
> 2 - I would really like it if the definition of containers were written
> without a particular implementation in mind. It's OK to explain that a
> Vector is logically an array, but _requiring_ that insertion at the
> beginning should take time O(N) is nonsensical!
I'm responding to Pascal's message, because it makes the point so
clearly, but this is really a more general comment.
This is the *usual* view of language design, and the usual view in the
Ada RM -- we specify the high-level semantics, and not the efficiency of
things.
However, I think for a container library, efficiency properties are the
key issue.
Consider "sequences" -- an ordered sequence of items, which can in
principle be numbered from 1 to N (or 0 to N-1, if the programmer
prefers). There are many possible implementations of "sequence" --
singly-linked lists, doubly-linked lists with dummy header, growable
arrays, fixed-size arrays, etc. Programmers choose among those
primarily for efficiency reasons.
Therefore, I think we should be thinking about a secondary standard that
contains a variety of "sequence" packages. Each should be named
according to the intended implementation, so the programmer can choose
wisely. We're saying "vector" (meaning "array-based" or "contiguous
hunk of storage") should be the one in the next RM -- but we expect
others, like linked lists.
So I disagree with Pascal above -- I think the container packages
*should* have a particular implementation in mind. I'll even go further
than Randy, and say that instead of "O(1) access" I really want "a
vector/array-based implementation".
Now, you may say that's overspecification. Why shouldn't the
implementer choose a "better" implementation? Well, for containers,
there is no "better" -- they just have different efficiency properties
(better for some uses, worse for others). As a programmer, I need to
know the underlying implementation.
The language designer cannot know which implementation of sequences is
"better". Nor can the implementer. Only the programmer can know.
Therefore, we should not let implementers choose, here.
If one implementer chooses "arrays, deallocated and reallocated when
growing" and the other implementer chooses "skip lists", it's a disaster
-- the programmer has no idea which package to choose.
I say, the vectors package should say (as Implementation Advice) "the
intended implementation is as an array", rather than saying something
about O(1) access. As others have pointed out, there's really no such
thing as O(1) random access -- if you make the vector big enough, you
will get O(log N) because of cache or paging effects.
Then a secondary standard can define 17 other varieties of "sequence"
that have different efficiency properties. None is "best" for all
purposes. However, it is desirable that they all have interfaces that
are as similar as possible.
SUMMARY: Don't let implementers choose the one be-all end-all sequence
package. We choose a particular sequence implementation
(vectors/arrays) that is useful, and let a secondary standard build all
the others. Let the programmer choose among them.
****************************************************************
From: Randy Brukardt
Sent: Monday, February 9, 2004 6:01 PM
Robert Duff:
> By the way, I find this discussion somewhat frustrating, because there
> are discussions going on in ada-comment, and also on arg.
Besides Bob's "real user" concerns, I am faced with the aggrevating task of
filing two unrelated threads on the same topic going on at the same time
into the same AI. I fear no one is going to be able to make sense out of the
!appendix section...
...
> So I disagree with Pascal above -- I think the container packages
> *should* have a particular implementation in mind. I'll even go further
> than Randy, and say that instead of "O(1) access" I really want "a
> vector/array-based implementation".
That's actually what the Implementation Advice says. But of course it is
Implementation Advice, so it has no force: Pascal can use a skip list if he
wants.
Going further than that would be useless and bad for at least some
implementations.
For instance, because of generic code sharing, the implementation of the
Vector type will essentially be an array of pointers. Because of that, I'll
probably implement this as an array of pointers, and use that to eliminate
copying in insert/delete/sort operations. Technically, that would still be a
correct implementation (insert would still be O(N), just the constant would
be a lot lower). But clearly, the ratio of execution times between the
various operations would be quite different for this package than for the
"canonical" implementation.
To avoid that, you'd pretty much have to specify the body of the package.
But even that doesn't really help. Again, looking at Janus/Ada, you're going
to get (implicit) allocations of the elements. So, for a Vector of
elementary, the cost of an Insert operation could be 20 times more than for
a non-sharing implementation. (While for a Vector of a type with an
expensive assignment, it might only be a few percent more.) For most uses of
the container, this difference in performance (which appears because the
unit is generic) is likely to matter more than the O(N) performance.
Of course, this is an extreme example, but it shows that the actual
performance of the container is going to depend heavily on the
implementation no matter what is specified in the standard. So going beyond
O(N) type specifications for key operations doesn't help, and could be
actively harmful (by preventing innovative implementations).
****************************************************************
From: Randy Brukardt
Sent: Monday, February 9, 2004 5:30 PM
Pascal wrote:
...
> 2 - I would really like it if the definition of containers were written
> without a particular implementation in mind. It's OK to explain that a
> Vector is logically an array, but _requiring_ that insertion at the
> beginning should take time O(N) is nonsensical! This is preventing
> possibly better implementations. I have also seen in a mail by Randy
> that element access has to be in O(1) (somehow I can't find this in the
> AI).
For the record, here's the wording from the AI. (I wrote this, Matt wanted
it, but didn't know how to express it. I'm not sure I do either - but I knew
I didn't want to define the O(N) notation...)
Implementation Advice
Containers.Vectors should be implemented similarly to an array. In particular,
the time taken by Append and Element should not depend on the number of
items in the Vector, and the time taken by Insert or Delete at First of the
vector should take time roughly proportional to the number of elements in the
vector.
And you are correct, the last part of the sentence should say "no worse
than" or something like that. (Although I can't think of any implementation
that meets the first part that doesn't also meet the second part exactly -
you can reduce the constant arbitrarily, but it still is proportional to N.)
> 4 - Like others, I don't like the type names ending in _Type (but I
> realize that's a matter of taste).
Our original idea was to avoid the "_Type". However, when I tried to do
that, there were a lot of conflicts with package, subprogram, and parameter
names. In the interests of the getting a report done on time, we wanted to
avoid major surgery to the proposal. (Especially updating the examples would
be painful.) So we stuck with "_Type".
If there is a majority opinion that it is worth going forward with these
packages, and that changing the names would be preferred, then I can spend
the time to do it. But I don't want to spend the ARG's limited resources
doing major changes if all we're going to do it kill the proposal anyway. (I
would hope that no one votes against the proposal solely because they don't
like the names - although such a result wouldn't surprise me.)
****************************************************************
From: Robert Dewar
Sent: Monday, February 9, 2004 6:01 PM
Randy Brukardt wrote:
> If there is a majority opinion that it is worth going forward with these
> packages, and that changing the names would be preferred, then I can spend
> the time to do it. But I don't want to spend the ARG's limited resources
> doing major changes if all we're going to do it kill the proposal anyway. (I
> would hope that no one votes against the proposal solely because they don't
> like the names - although such a result wouldn't surprise me.)
It's always risky to vote for something that is flawed with the
expectation of fixing it.
On the other hand, at least one delegation in Salem that was strongly
in favor of adding the keyword CLASS to the language voted against
JDI's proposal because they did not like the prefix notation (I told
Jean not to mix up the issues, but he did not listen to me). They
were quite dismayed that the proposal failed. So you never know...
(interestingly to wonder what would have happened at Salem if that
delegation had understood how the vote worked and voted their actual
interests, then the vote would have been 3-2 in favor of class X is ...
and the US was posed to follow the winning side, so the eventual vote
would have been 4-2 and who knows what would have happened?)
(sorry to digress, but it's an interesting little piece of Ada trivia
history :-)
****************************************************************
From: Robert Dewar
Sent: Monday, February 9, 2004 6:04 PM
Robert A Duff wrote:
> I'm responding to Pascal's message, because it makes the point so
> clearly, but this is really a more general comment.
>
> lot's of sensible stuff deleted here
>
> SUMMARY: Don't let implementers choose the one be-all end-all sequence
> package. We choose a particular sequence implementation
> (vectors/arrays) that is useful, and let a secondary standard build all
> the others. Let the programmer choose among them.
I find Bob's comments here to make a lot of sense, and I agree with
all of them (yes I know that's a change in position, but I think the
fact that this is IA, and Bob's useful perspective make the difference).
****************************************************************
From: Pascal Leroy
Sent: Monday, February 9, 2004 4:58 AM
Bob chided me:
> By the way, I find this discussion somewhat frustrating,
> because there are discussions going on in ada-comment, and
> also on arg. People are raising some of the same points on
> both. It seems like the ARG should pay a lot of attention to
> real users on this issue, but I fear some key ARG members are
> not currently listening to ada-comment, and many ada-comment
> folks are not seeing the arg mailing list.
Sorry, the signal/noise ratio on Ada-Comment is too poor, I admit that I
don't have the patience to read all that stuff, and I didn't want to get
50 replies to my initial message. Anyway, to avoid confusion, I promise
I will shut up until this topic is discussed face-to-face in Phoenix.
Randy pointed out:
> All of the performance "requirements" *are* written as
> Implementation Advice. There isn't any way that I can think
> of to make them normative, and in any case, that would be
> overspecification.
I realize that they are IA, and that's fine. I am just arguing that the
advices as written are excluding perfectly good implementations. Of
course I can ignore them, but that's not a satisfactory answer to me: if
we put them in the RM they should be useful.
Tuck commented:
> I agree we shouldn't overspecify, but nor should we
> underspecify. We need to specify enough to establish useful,
> reasonable expectations for implementors and users, so the
> container library is not just a toy, but is actually a useful
> part of the professional Ada programmer's toolkit.
I completely agree with this principle. The performance advices are
only there to prevent "bad" implementation. They should not constrain
"good" implementations. For instance, using a bubble sort is a no-no,
but we want an implementer to be able to use heapsort, quicksort or
shellsort (or a combination of the three). Similarly, a Vector should
not be implemented using a simple linked list, but an array or a skip
list are both valid implementations.
> If you believe O(Log N) is acceptable, we can consider that.
As others have pointed out, O(1) and O(Log N) are hardly distinguishable
in practice, it's only the multiplicative factor that counts, so yes, I
believe that we should allow O(Log N) access for vectors.
Back to Bob:
> This is the *usual* view of language design, and the usual
> view in the Ada RM -- we specify the high-level semantics,
> and not the efficiency of things.
>
> However, I think for a container library, efficiency
> properties are the key issue.
I don't see what makes a container library so different from all the
rest. Let me draw your attention to the fact that we don't specify
efficiency properties for the string packages, or for the numerics
(including the matrix operations of AI 296). I know that Bob doesn't do
numerics, but for people who do, the performance of these libraries are
likely to be more critical than that of containers. In practice what
happens is that they run benchmarks, and talk sternly to their vendor if
they don't like the results.
> Therefore, I think we should be thinking about a secondary
> standard that contains a variety of "sequence" packages.
> Each should be named according to the intended
> implementation, so the programmer can choose wisely. We're
> saying "vector" (meaning "array-based" or "contiguous hunk of
> storage") should be the one in the next RM -- but we expect
> others, like linked lists.
You are on the right track to kill this proposal with kindness ;-)
> So I disagree with Pascal above -- I think the container packages
> *should* have a particular implementation in mind. I'll even
> go further than Randy, and say that instead of "O(1) access"
> I really want "a vector/array-based implementation".
But what do you gain if you don't specify the multiplicative factor? I
have this wonderful implementation of vectors, I swear it's O(1), but
for some reason the multiplicative constant is such that it takes 1 sec
on average to access an element. This is a Duff-compliant
implementation, but hardly a good one.
Surely you don't want to get into the business of specifying the factor,
right? Unless of course your target is a MIX computer ;-)
> Now, you may say that's overspecification. Why shouldn't the
> implementer choose a "better" implementation? Well, for
> containers, there is no "better" -- they just have different
> efficiency properties (better for some uses, worse for
> others).
The same is true for everything. For the elementary functions, you have
a trade-off between speed and accuracy. Which is best? Depends on the
application. For the random numbers, there is a trade-off between
speed, size of the generator, and quality of the random numbers. Again,
there is no better implementation.
> If one implementer chooses "arrays, deallocated and
> reallocated when growing" and the other implementer chooses
> "skip lists", it's a disaster
> -- the programmer has no idea which package to choose.
Either the programmer doesn't care, for instance because they only put a
few elements in the vector, and both implementations are fine (that's
Randy's viewpoint, I think). Or the programmer does care, and he better
run a simple benchmark with, say, a 10-million-element vector, and see
what happens.
> SUMMARY: Don't let implementers choose the one be-all end-all
> sequence package. We choose a particular sequence implementation
> (vectors/arrays) that is useful, and let a secondary standard
> build all the others. Let the programmer choose among them.
SUMMARY: For once, I disagree with just about everything that Bob wrote.
****************************************************************
From: Robert A. Duff
Sent: Tuesday, February 10, 2004 8:35 AM
> Bob chided me:
I didn't mean to chide you in particular. In fact, I didn't mean to
chide anybody. I was merely lamenting the fact that there is no forum
where the public (i.e. ada-comment folks) and the arg can discuss the
issue of containers. Sorry.
> > However, I think for a container library, efficiency
> > properties are the key issue.
>
> I don't see what makes a container library so different from all the
> rest. Let me draw your attention to the fact that we don't specify
> efficiency properties for the string packages, or for the numerics
> (including the matrix operations of AI 296). I know that Bob doesn't do
> numerics, but for people who do, the performance of these libraries are
> likely to be more critical than that of containers. In practice what
> happens is that they run benchmarks, and talk sternly to their vendor if
> they don't like the results.
It seems to me that for most features of the language, either efficiency
doesn't matter all that much, or else it's fairly obvious what the
efficiency properties will be. I *know* (roughly) how compilers
represent integers, arrays, records, etc. But there are many wildly
different ways to represent sequences. I don't want Vectors represented
as skip lists any more than I want built-in arrays implemented as skip
lists.
There are a few cases like this is Ada already. One example is
size-changing records (i.e. defaulted discriminants). Some compilers
choose an allocate-the-max-size strategy, and others choose a heap-based
strategy. The former is unacceptable when the max size is 2**33 bytes.
The latter is unacceptable in real-time systems that don't want heap
allocation, or whenever the extra level of indirection is too costly.
It's not obvious which implementation choice is "better".
If Ada 83 had specified (as a NOTE or whatever) which choice the
language designers expected, use of this feature would have been
much more portable.
As to numerics, I don't know what I'm talking about, but I know that the
numerics annex is full of accuracy requirements. Isn't the
implementer's goal simply "as fast as possible, given the accuracy
requirements"? Are there wildly different implementation strategies?
I was under the impression that it's more like, "spend more money,
make the algorithms incrementally faster".
As to matrices, I don't know what I'm talking about there, either,
but don't we want all vendors to use a two-dimensional array?
An implementer that chose a sparse representation wouldn't be
doing any favors, right?
...
> But what do you gain if you don't specify the multiplicative factor? I
> have this wonderful implementation of vectors, I swear it's O(1), but
> for some reason the multiplicative constant is such that it takes 1 sec
> on average to access an element. This is a Duff-compliant
> implementation, but hardly a good one.
I trust implementers not to *deliberately* sabottage their products.
But implementers need to understand what's expected of them.
I want to say that for Vectors, an array-based implementation is
expected -- we're not asking for the world's greatest all-purpose
sequence package here; we're asking for growable arrays.
> Surely you don't want to get into the business of specifying the factor,
> right? Unless of course your target is a MIX computer ;-)
Agreed.
> SUMMARY: For once, I disagree with just about everything that Bob wrote.
Oh, well. :-(
****************************************************************
From: Pascal Leroy
Sent: Tuesday, February 10, 2004 10:14 AM
> I don't want Vectors represented as skip lists
> any more than I want built-in arrays implemented as skip lists.
But why? You have to explain why, you cannot just say "I don't want".
When I look at the specification of Vectors, the first implementation
that comes to my mind is to use an array if the vector is not too large,
and dense enough. If it becomes too large I would probably want to
switch to a skip list implementation: this would avoid the unreasonable
O(N) cost on insertion/deletion. Similarly if the vector becomes very
sparse (not many active elements), I would switch to a skip list
implementation to save space (and indexing would become a bit more
costly).
Of course the skip list would not store individual elements, but
probably chunks that have sufficiently high density.
Surely there are a number of parameters/threshold to be selected to do
the switch from array to skip list, but they should be easy to select by
graphing the space/time characteristics of each algorithm and looking
for the point where these characteristics intersect.
Incidentally we do this for string search: for small strings we use the
na‹ve algorithm, and when the string becomes large we switch to
Boyer-Moore. You could call this overengineering, but as a user I don't
see why you would complain.
Now you've got to explain to me what is wrong with this approach. Let
me say it again: this is the first thought that comes to my mind when I
read the specification of Vectors, so I'd like to be educated.
****************************************************************
From: Robert Dewar
Sent: Tuesday, February 10, 2004 10:23 AM
surely everyone would prefer a skip list if using a contiguous vector
would force page faults for every access to the vector.
I assume the real interest here is speed, not O(1) at the cost of
any constant
Anyway this is only IA :-)
****************************************************************
From: Robert A. Duff
Sent: Tuesday, February 10, 2004 11:11 AM
Pascal wrote:
> But why? You have to explain why, you cannot just say "I don't want".
I've got nothing against skip lists. What I really want is
uniformity of efficiency across implementations. The only way I know
how to achieve that is for the programmer to choose among basic
implementation strategies.
> When I look at the specification of Vectors, the first implementation
> that comes to my mind is to use an array if the vector is not too large,
> and dense enough. If it becomes too large I would probably want to
> switch to a skip list implementation: this would avoid the unreasonable
> O(N) cost on insertion/deletion. Similarly if the vector becomes very
> sparse (not many active elements), I would switch to a skip list
> implementation to save space (and indexing would become a bit more
> costly).
OK, now you're talking about a hybrid strategy. I don't see how the
implementation could know about sizes and densities at compile time, so
I assume what you mean is that the Vector implementation gathers
statistics at run time, and switches among different strategies based on
that information.
The overhead of gathering statistics and checking them at relevant times
is worth it in some cases, and not in others. All I'm saying is that
only the programmer can make that choice. In my current project, we use
growable arrays that are almost always quite small. The above "fancy"
implementation would be inappropriate.
Now if you say the "fancy" implementation is a good one, fine, then the
RM should encourage *all* implementers to use it. Then I, as a
programmer, can know that I don't want to use the language-defined
Vectors package. In other cases, I can decide that the language-defined
package is appropriate. But if I have no idea what the underlying
implementation is, I can *never* use the language-defined (except
perhaps in toy programs that don't care about efficiency, or portability
thereof).
> Of course the skip list would not store individual elements, but
> probably chunks that have sufficiently high density.
>
> Surely there are a number of parameters/threshold to be selected to do
> the switch from array to skip list, but they should be easy to select by
> graphing the space/time characteristics of each algorithm and looking
> for the point where these characteristics intersect.
>
> Incidentally we do this for string search: for small strings we use the
> na‹ve algorithm, and when the string becomes large we switch to
> Boyer-Moore. You could call this overengineering, but as a user I don't
> see why you would complain.
Well, I suppose I wouldn't complain about that.
> Now you've got to explain to me what is wrong with this approach. Let
> me say it again: this is the first thought that comes to my mind when I
> read the specification of Vectors, so I'd like to be educated.
There's nothing wrong with that approach (I assume we're talking about
the Vector case, not the string-search case). But if you choose that
approach, and some other compiler-writer chooses a wildly different
approach, the programmer will be lost.
****************************************************************
From: Randy Brukardt
Sent: Monday, February 9, 2004 6:54 PM
Jeffrey Carter:
> Randy Brukardt wrote:
> >
> > Huh? You've said, in effect, that the performance isn't good enough
> > for applications where the performance doesn't matter. That's a
> > pretty goofy statement!
>
> Actually, you originally said something like that. You have said
>
> 1. That the vector component should only be used by applications where
> performance doesn't matter.
>
> 2. That the difference in performance between possible implementations
> of vector may be critical to applications that use it.
>
> If performance doesn't matter to these applications, then the
> restriction on implementations should be removed. However, I agree with
> you that even applications that are suitable for the use of standard
> components may find the performance difference between different
> implementations critical.
That's what I get for trying to argue a position that I don't believe in.
My position is that performance does not matter for these components.
Period.
However, that's a minority position, and I understand the other argument.
The trouble with including performance is that then you must have enough
container forms to handle the most common performance profiles - that means
at least 4 sequence containers (and probably more - at least bounded and
unbounded forms, and list and vector forms) and similarly at least 8
associative containers, and we simply don't have the manpower to properly
specify such a library.
But in any case, I'm obviously not good at arguing it the current position,
and I'm not going to try anymore.
---
That said, my opinion is that the only container worth having (with no
performance requirements) is a map. The Set isn't sufficiently different,
and no sequence container is worth the effort. And such a container probably
ought to hold indefinite elements. (Performance doesn't matter, remember.)
But that position is a minority position (of one!), and I'm not going to
argue that, either.
****************************************************************
From: Robert A. Duff
Sent: Monday, February 9, 2004 7:13 PM
> That's what I get for trying to argue a position that I don't believe in.
;-)
> My position is that performance does not matter for these components.
> Period.
>
> However, that's a minority position, and I understand the other argument.
I'm afraid that I take the opposite position: efficiency is the key
issue. I'll take the liberty of reposting my response on arg here:
[Editor's note: This word-for-word repeat of 50+ lines is removed to keep these
comments manageable. You can find it about 600 lines back; look for "This is
the *usual* view of language design..." in a message from Bob on Monday
at 4:27 PM]
> The trouble with including performance is that then you must have enough
> container forms to handle the most common performance profiles - that means
> at least 4 sequence containers (and probably more - at least bounded and
> unbounded forms, and list and vector forms) and similarly at least 8
> associative containers, and we simply don't have the manpower to properly
> specify such a library.
I'm saying we should lead the way toward those 4 or 8, as opposed to
trying to be the last word on "sequences" or "mappings" or etc.
****************************************************************
From: Randy Brukardt
Sent: Monday, February 9, 2004 7:18 PM
Jeffrey Carter wrote:
> Regarding Size and Resize, you wrote:
>
> > That's no different than many of the attributes in Ada, which (if set),
> > always return the values that they were set to. But what the compiler does
> > with those values is (almost) completely implementation-defined.
>
> There is a difference between a compiler directive and an operation of a
> package. The latter must have well defined behavior that is not
> implementation defined.
That's a goofy statement. There are lots of package operations in Ada that
have implementation-defined behavior. Try any file Open or anything in
Ada.Command_Line, for instance.
> > Huh? Resize tells the container a reasonable size to use; what the container
> > does with that information is up to it. Size simply returns that information.
>
> What does Size return if Resize has not been called?
The implementation-defined initial size of the container.
Note that there is still quite a bit of overspecification in some of the
wording. I didn't have the time or energy to rewrite every second line of
Matt's proposal, and it wasn't clear that I had the support of the committee
to do so, either.
> If the intention is as you described, then the operations appear to be
> useless, and should be eliminated.
Why? Giving a container an idea of how many elements it will contain can be
a great efficiency help. But there shouldn't be any specification of what it
will mean.
> The introductory text to Vectors does not make it clear that this is an
> extensible array (EA).
Probably because no one uses such a term! The first time I can recall anyone
talking about extensible arrays in my 25+ years of programming (including my
college courses) was last week. I of course know what is meant because the
words have their conventional meanings, but I doubt that there are many
people out there looking up "extensible array" in an index!
...
> So, if the ARM gains a mathematical library of matrices and vectors,
It already did. See AI-296, already approved. (Note that this is an old Ada
83 standard that has not been widely used - but the fact remains that Ada
has had vectors in the mathematical sense for a long time.)
> However, this is really a general problem, and a general solution might
> be advisable. There are no predefined modular types in Standard, so we
> might want to add
>
> type Maximal_Count is mod implementation-defined;
Adding types to Standard is dangerous, because they hide ones visible via a
use-clause. We're not planning to add anything named to Standard for this
reason. Adding it to Ada could cause trouble if there is a use clause for
Ada in a program. So, I'd suggest such a type be added to Ada.Containers
(next to Hash_Type).
> I don't understand why the string-keyed maps exist, since they are
> equivalent to a map with an unbounded string key. The implementation
> would have to store the provided key in an appropriate unbounded string,
> or duplicate the functionality of unbounded strings.
No, a stringspace implementation would be much better than Unbounded_String
for storing large numbers of strings of unknown length. That's precisely the
idea of this component (and the reason it exists separately).
Unbounded_Strings require many tiny allocations, while a stringspace
implementation requires just one (or a few) larger ones.
...
> This discussion of the searchable structure and the map based on it
> seems to indicate a basic design problem with the hashed map component.
> A hash table is not trivial to implement correctly. There are uses for
> hash tables other than maps. As it stands, the user who wants a hash
> table must create one, duplicating the effort performed for the map, and
> increasing the likelihood of errors.
Huh? What could you do with a separate hash table that you couldn't do with
a map? The hash "buckets" contain *something*, and that something is (or can
be) the same as the map elements.
I suspect that if you try to develop this separate container, you'll end up
with pretty much the same interface as map - so there is no reason for a
separate version.
****************************************************************
From: Jeffrey Carter
Sent: Tuesday, February 10, 2004 12:52 PM
Randy Brukardt wrote:
> That's a goofy statement. There are lots of package operations in Ada
> that have implementation-defined behavior. Try any file Open or
> anything in Ada.Command_Line, for instance.
At least I'm consistent :) I agree. In retrospect, I worded that badly,
using general terms when referring to specifics.
>>> Huh? Resize tells the container a reasonable size to use; what the container
>>> does with that information is up to it. Size simply returns that information.
>
>> What does Size return if Resize has not been called?
>
> The implementation-defined initial size of the container.
OK. Let's see if I understand your position correctly. Resize gives the
implementation a hint about a reasonable size to use, but the
implementation may do whatever it wants, including nothing. Size returns
the actual size of something if Resize has not been called, but the last
size given to Resize if Resize has been called, regardless of what the
implementation does (or doesn't) do with that size.
So it appears that you are saying the implementation is required to keep
track of whether Resize has been called, and to store the size passed to
Resize. That doesn't seem like a very useful requirement to me.
It's fun to argue this kind of thing, but we're really wasting time. My
concern is not really what you think should be required, but what the
proposal actually requires.
> Giving a container an idea of how many elements it will contain can
> be a great efficiency help. But there shouldn't be any specification
> of what it will mean.
That's fine. But the specification of Resize requires that it perform an
allocation. That's primarily why these operations concern me.
Allowing the user to know the current size doesn't seem very useful to
me, but I don't see how it can hurt. Allowing the user to force a resize
does seem unwise.
Resize is an appropriate name for the operation as specified. I expect
an operation named Resize to cause resizing. If we're really talking
about giving the implementation a hint about an appropriate size, then
not only does the specification need to be changed, the name also needs
to be different (perhaps Size_Hint?).
> Note that there is still quite a bit of overspecification in some of
> the wording. I didn't have the time or energy to rewrite every second
> line of Matt's proposal, and it wasn't clear that I had the support
> of the committee to do so, either.
Right, and most of my comments were identifying such areas and
presenting alternative wording. I hope, as such, they are useful. I can
understand you not being able to correct all of these, but if they are
not corrected, the current proposal is unacceptable.
Normally, I would think the original author would be the best person to
make such changes. However, Heaney's response to suggestions that the
proposal could be improved has uniformly been that the proposal is
correct as it stands (although I see that after saying that the vector
doesn't need an iterator, he has now added an iterator to his reference
implementation). Perhaps he is more amenable to requests for
modifications from the select committee.
I do have time at the moment, and am willing to make the effort if that
is desired. The committee needs to ask, since I'm unwilling to waste the
effort.
> Probably because no one uses such a term! The first time I can recall
> anyone talking about extensible arrays in my 25+ years of programming
> (including my college courses) was last week. I of course know what
> is meant because the words have their conventional meanings, but I
> doubt that there are many people out there looking up "extensible
> array" in an index!
Surely I didn't invent the term! I agree with you, though. This is a
case where I'm familiar with the concept and have used versions of it
for decades, but I've never encountered a general name for it, except I
know it's not a vector. By analog to unbounded strings, perhaps
unbounded array is best.
> It already did. See AI-296, already approved. (Note that this is an
> old Ada 83 standard that has not been widely used - but the fact
> remains that Ada has had vectors in the mathematical sense for a long
> time.)
Good. I was not aware of this standard. However, this simply reinforces
my opposition to calling unbounded arrays "vectors".
> Adding types to Standard is dangerous, because they hide ones visible
> via a use-clause. We're not planning to add anything named to
> Standard for this reason. Adding it to Ada could cause trouble if
> there is a use clause for Ada in a program. So, I'd suggest such a
> type be added to Ada.Containers (next to Hash_Type).
OK. This should be useful for more than containers, so I'd like to see
it somewhere higher in the hierarchy, though the most important thing is
to avoid defining such types all over the place, like the Count types in
the IO packages. The odds of a conflict if it's in Ada are small, so I
wouldn't think that would be a problem. If the ARG/committee objects to
putting it in Ada, perhaps there should be a special child package for
such things.
> No, a stringspace implementation would be much better than
> Unbounded_String for storing large numbers of strings of unknown
> length. That's precisely the idea of this component (and the reason
> it exists separately). Unbounded_Strings require many tiny
> allocations, while a stringspace implementation requires just one (or
> a few) larger ones.
In general, a key is added to a map only once, and never modified. Using
Unbounded_String would, therefore, only need one allocation per key, so
I don't see that many tiny allocations are needed. However, you probably
know more about this sort of thing, since compilers need to do this kind
of thing a lot, so I may well be mistaken.
> Huh? What could you do with a separate hash table that you couldn't
> do with a map? The hash "buckets" contain *something*, and that
> something is (or can be) the same as the map elements.
Suppose I want to store Integers in a hash table so I can determine if
I've seen one before. There is no mapping from Integers to anything
else. Yes, I can do that with a map, by providing a dummy type for the
element type, and a dummy value for the element parameters, but that's
an ugly kludge. Defining a map in terms of a hash table is neither ugly
nor a kludge.
> I suspect that if you try to develop this separate container, you'll
> end up with pretty much the same interface as map - so there is no
> reason for a separate version.
A hash table doesn't have an operation to obtain an element given a key,
for example. I could agree that there's no reason for a separate map
given a hash table, since maps are trivial to implement with a hash
table, but ideally I'd like to see both. Ada can do better than
expecting the use of ugly kludges.
****************************************************************
From: Matthew Heaney
Sent: Tuesday, February 10, 2004 1:08 PM
Jeffrey Carter wrote:
> Normally, I would think the original author would be the best person to
> make such changes. However, Heaney's response to suggestions that the
> proposal could be improved has uniformly been that the proposal is
> correct as it stands (although I see that after saying that the vector
> doesn't need an iterator, he has now added an iterator to his reference
> implementation). Perhaps he is more amenable to requests for
> modifications from the select committee.
I said a vector doesn't need an *active* iterator. My opinion on that
matter hasn't changed: active iterators (aka "cursors") are too
error-prone for (array-based) vectors.
I wasn't sure whether we needed *passive* iterators for a vector, since
Ada already provides a built-in for loop. However, there has been
interest, and so *passive* iterators were added.
> Suppose I want to store Integers in a hash table so I can determine if
> I've seen one before.
Use a set, not a map.
The latest version of the reference implementation now supports the
stream attributes for containers.
<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040210.zip>
****************************************************************
From: Jeffrey Carter
Sent: Tuesday, February 10, 2004 6:35 PM
Matthew Heaney wrote:
> I said a vector doesn't need an *active* iterator. My opinion on
> that matter hasn't changed: active iterators (aka "cursors") are too
> error-prone for (array-based) vectors.
>
> I wasn't sure whether we needed *passive* iterators for a vector,
> since Ada already provides a built-in for loop. However, there has
> been interest, and so *passive* iterators were added.
The actual things said were:
>> Vector should have an iterator, in addition to allowing the user to
>> explicitly iterate over the structure.
>
> No. Vector iterators are fragile, and hence very error prone.
>
> They are fragile because the (logical) internal array gets thrown
> away during expansion, which invalidates the iterator. It's too hard
> to keep track of whether a vector iterator is still valid, and most
> of the time you end up with a dangling reference.
I was discussing the proposal in AI-302-03, so of course I used its
terminology. I did not mention cursors, nor did you.
You should also look "active" and "passive" up in a good dictionary.
Then perhaps you would discover what they mean, and realize that cursors
are passive and procedures are active.
Precision in terminology is important.
>> Suppose I want to store Integers in a hash table so I can determine
>> if I've seen one before.
>
> Use a set, not a map.
A typical answer: the proposal is perfect, therefore any problem a user
has with it must be with the user, not the library.
Yes, I want a set, but I want a hashed set, not one based on an O(log N)
search, perhaps because I know that with my hash function and expected
distribution of values, I can expect O(1) from a hash table.
****************************************************************
From: Matthew Heaney
Sent: Wednesday, February 11, 2004 9:33 AM
The terms "active iterator" and "passive iterator" are discussed in
section 7.3, Variations on a Theme: Iterators, in his book describing
the original Booch Components library:
Software Components With Ada
Grady Booch
Benjamin/Cummings Publishing Company 1987
p. 157-8: "Basically, there are two approaches to iteration, called
active and passive. In the active approach, we expose the iterator as a
collection of primitive operations, but, in the passive approach, we
export only a single operation."
p. 158: "We shall first discuss the active iterator. The iterator can
be considered an object of an abstract data type, characterized by the
following operations: Initialize, Get_Next, Value_Of, Is_Done."
p. 159: "With the passive iterator, rather than exporting the type
Iterator and its associated operations, we instead export a single
generic procedure that is nested in the specification of the queue
component."
The Iterator design pattern (aka "Cursor") is described in:
Design Patterns: Elements of Reusable Object-Oriented Software
Erich Gamma et al
Addison-Wesley Publishing Company 1995
p. 260: "Who controls the iteration? A fundamental issue is deciding
which party controls the iteration, the iterator or the client that uses
the iterator. When the client controls the iteration, the iterator is
called an external iterator, and when the iterator controls it, the
iterator is an internal iterator. [footnote on p.260: Booch refers to
external and internal iterators as active and passive iterators,
respectively. The terms "active" and "passive" describe the role of the
client, not the level of activity of the iterator.] Clients that use an
external iterator must advance the traversal and request the next
element explicitly from the iterator. In contrast, the client hands an
internal iterator an operation to perform, and the iterator applies that
operation to every element in the aggregate."
The footnote in Gamma was referring to the information in the section
Iteration in Chap. 9 (Frameworks) of:
Object-Oriented Analysis and Design with Applications, 2nd ed
Grady Booch
Benjamin/Cummings Publishing Company 1994
p. 356: "For each structure, we provide two forms of iteration.
Specifically, an active iterator requires that clients explicitly
advance the iterator; in one logical expression, a passive iterator
applies a client-supplied function, and so requires less collaboration
on the part of the client. [footnote on p. 356: Passive iterators
implement an "apply" function, an idiom commonly used in functional
programming languages.]"
Section 8.3.6 (Iterators) of the Ada95 Quality and Style Guide explains
the difference between active iterators and passive iterators as
follows:
"The terms active and passive are used to differentiate whether the
iteration mechanism (i.e., the way in which the complex data structure
is traversed) is exposed or hidden. A passive iterator hides the
traversal (e.g., looping mechanism) and consists of a single operation,
iterate, that is parameterized by the processing you do on each element
of the data structure. By contrast, an active iterator exposes the
primitive operations by which you traverse the data structure (Booch
1987)."
<http://www.adaic.com/docs/95style/html/sec_8/8-3-6.html>
My article at adapower.com, "Iterator and Factory Method Patterns
Combined," describes the difference between an active and passive
iterator as follows:
"There are two kinds of iterators: passive and active. A passive iterator
controls the actual movement within the data structure, and all a client
has to do is supply a procedure to receive each item in turn.
"An active iterator moves the responsibility for movement onto the
client. Unlike a passive iterator, which is essentially just a generic
subprogram, an active iterator is an actual type, with primitive
operations for retrieving the current item and for moving to the next
item in the sequence."
<http://www.adapower.com/alg/activeiter.html>
The "Algorithms and Data Structures I" (CS 131) course at the Dept of
Computer Science of The George Washington University has this to say
about the distinction between passive and active iterators:
"The linked-list package introduced in Section 8.2 provides an operation
called Traverse, which moves through the list, from beginning to end,
one element at a time, until each element has been "visited" exactly once.
"Formally, this Traverse operation is an example of a passive iterator
operation. An iterator is any operation that iterates through a data
structure one element at a time; we call it passive because the client
program simply calls it once and "stands back" passively while the
iterator roams through the entire structure. In this note, we use the
terms traversal and iteration interchangeably.
"Sometimes an application requires iterating through a structure,
touching each element once, but allowing the client program the
flexibility to decide just when to proceed to the next element. Moving
through a structure in this fashion is called active iteration, because
the client program is actively involved in the process at every step.
Active Iterator Operations: To be actively involved in the iteration,
the client program must execute a loop. We know that any loop must
contain statements for loop initialization, termination, and
incrementation; to support active iteration, the data structure package
must provide these operations, and also one for retrieval of the current
element in the traversal."
<http://www.seas.gwu.edu/~csci131/fall01/active-traversals.html>
The "Advanced Object-Oriented Design & Programming" (CS 635) at San
Diego State University says this about passive iterators:
"Neither Java nor C++ support passive iterators. Smalltalk does support
them. In a passive iterator, you pass a method or function to the
composite object, and the object then applies the method to all elements
in the object."
<http://www.eli.sdsu.edu/courses/spring01/cs635/notes/object/object.html>
In the topic "Generic Programming: Iterators" in the CS 412/512 course
at Old Dominion University, section 1.1 defines passive and active
iterators this way:
"Iterators can be:
o passive: we pass a function to the iterator and tell it to apply the
function to each item in the collection
o active: we ask the iterator to give us items, and each time it does,
we apply the desired function to it."
<http://www.cs.odu.edu/~zeil/cs412/Lectures/09iterators/iterators_summary.pdf>
In his description of the Bedrock framework for Macintosh apps, Scott
L. Taylor describes the iterators of the C++ Booch Components as
follows:
"Each structure comes with its own form of an iterator that allows
traversal of items within a structure. Two types of iterators are
provided for each structure, passive and active. Passive iterators
require much less interaction on the part of the client. A passive
iterator is instantiated and used by calling the iterator's apply()
method with a function pointer to the function to apply to all the
elements within the structure. Active iterators allow much more
flexibility but require more interaction from the client. Active
iterators must be told to go on to the next item, and the iterator
object returns a reference to each item in the structure for the client
to process or use. Active iterators are very similar to MacApp style
iterators."
<http://www.mactech.com/articles/frameworks/7_6/Booch_Components_Taylor.html>
The iterators of the container classes in the ET++ framework are
described like this:
"There are two types of iterators - passive and active iterators. The
latter provide methods for iterating to be called directly by the client
while with passive iterators the client provides a method to be called
on each element in the container."
<http://swt.cs.tu-berlin.de/~ron/diplom/node58.html>
In "An Overview of the Booch Components for Ada95," the iterators are
described this way:
"There are two forms: active and passive. Active iteration requires the
client explicitly advance the iterator. For passive, the client supplies
a single function "Apply" to work across the structure."
<http://www.rivatech.com/booch/documentation.html>
<http://www.pogner.demon.co.uk/components/bc/documentation.html>
> Precision in terminology is important.
Indeed, and my use of the terms "active iterator" and "passive iterator"
is consistent with the references cited above.
> Yes, I want a set, but I want a hashed set, not one based on an O(log N)
> search, perhaps because I know that with my hash function and expected
> distribution of values, I can expect O(1) from a hash table.
My original proposal had hashed sets. (It also had sorted maps.)
However, in order to reduce the scope of the change to the language
standard the size of the proposal was reduced, and hashed sets didn't
make the cut. No one got every container they wanted, not even me.
If you need a hashed set right now, then just grab the hash table from
the reference implementation and assemble it yourself.
<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040211.zip>
****************************************************************
From: Jeffrey Carter
Sent: Wednesday, February 11, 2004 12:56 PM
Matthew Heaney wrote:
> The terms "active iterator" and "passive iterator" are discussed in
> section 7.3, Variations on a Theme: Iterators, in his book describing
> the original Booch Components library:
>
> Software Components With Ada
> Grady Booch
> Benjamin/Cummings Publishing Company 1987
I'm familiar with Booch and the many errors he made in this book. I'm
also aware that many others are unable to think for themselves and have
slavishly followed his lead. I see that you have not looked up "active"
and "passive" and thought about what the phrase "active iterator"
actually means in English. You have simply quoted the errors of others.
Argument by authority is always suspect.
We now have a situation where the terms are actively confusing, and no
one who wants to communicate effectively uses them.
****************************************************************
From: Matthew Heaney
Sent: Wednesday, February 11, 2004 1:41 PM
I don't know what you mean by "glory,"' Alice said.
Humpty Dumpty smiled contemptuously. `Of course you don't -- till I tell
you. I meant "there's a nice knock-down argument for you!"'
`But "glory" doesn't mean "a nice knock-down argument,"' Alice objected.
`When _I_ use a word,' Humpty Dumpty said in rather a scornful tone, `it
means just what I choose it to mean -- neither more nor less.'
`The question is,' said Alice, `whether you CAN make words mean so many
different things.'
`The question is,' said Humpty Dumpty, `which is to be master - - that's
all.'
****************************************************************
From: Marius Amado Alves
Sent: Wednesday, February 11, 2004 2:07 PM
> We now have a situation where the terms are actively confusing, and no
> one who wants to communicate effectively uses them.
Please let's define then:
- active iterator: use of a Cursor_Type object
- passive iterator: use of a generic iteration procedure
I hope that's right...
/*
Personally I don't find the active/passive metaphor the most appropriate.
Manual/automatic would be more fitting. Active/passive for me is more
suggestive of program/data and read-and-write/read-only.
Also a confusion was that before alternative 3 "iterator" meant two different
things, namely the cursor and the abstract procedure (use of...). But now it
only means the latter.
*/
****************************************************************
From: Jeffrey Carter
Sent: Wednesday, February 11, 2004 5:46 PM
Marius Amado Alves wrote:
>
> Please let's define then:
> - active iterator: use of a Cursor_Type object
> - passive iterator: use of a generic iteration procedure
> I hope that's right...
No, the proposal has this right:
Cursor : a value that indicates a specific element in a container.
Iterator: a procedure that applies an action to each element in a
container in turn.
****************************************************************
From: Matthew Heaney
Sent: Thursday, February 12, 2004 9:38 AM
> - active iterator: use of a Cursor_Type object
Yes.
> - passive iterator: use of a Generic_Iteration procedure
Yes.
> I hope that's right...
Yes, that's correct.
> Also a confusion was that before alternative 3 "iterator" meant two different
> things, namely the cursor and the abstract procedure (use of...). But now it
> only means the latter.
An iterator is a mechanism for visiting elements in a container. There
are two kinds of iterators: "active" iterators and "passive" iterators.
****************************************************************
From: Stephen Leake
Sent: Tuesday, February 10, 2004 2:45 PM
Jeffrey Carter <jrcarter@acm.org> writes:
> Allowing the user to know the current size doesn't seem very useful to
> me, but I don't see how it can hurt. Allowing the user to force a resize
> does seem unwise.
The user could run her application for a while, then query the current
size of the map and store it in a config file. Then, when the
application starts the next time, it reads the required size from the
config file, and calls Map.Resize. The intent is that this allows the
application to avoid all the resizes on the second run.
****************************************************************
From: Matthew Heaney
Sent: Tuesday, February 10, 2004 3:00 PM
That's one (clever) application of Resize. The intent is that if you
know a priori what the ultimate number of elements will be, then this
avoids any expansion during insertion. Insertion behavior is thus more
uniform.
See the examples in ai302/hash and ai302/hash2 in the reference
implementation for more ideas.
<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040210.zip>
****************************************************************
From: Jeffrey Carter
Sent: Tuesday, February 10, 2004 6:40 PM
I think I was talking about vectors.
Length is sufficient for this. The main problem is that the
specification prohibits some implementations: Resize is specified as
requiring an allocation, which may not be appropriate for some
implementations. Size_Hint, with no requirement what the implementation
does with the value, is more appropriate.
****************************************************************
From: Randy Brukardt
Sent: Wednesday, February 11, 2004 10:37 PM
Jeffrey Carter wrote:
(Sorry, I missed this yesterday.)
...
> OK. Let's see if I understand your position correctly. Resize gives the
> implementation a hint about a reasonable size to use, but the
> implementation may do whatever it wants, including nothing. Size returns
> the actual size of something if Resize has not been called, but the last
> size given to Resize if Resize has been called, regardless of what the
> implementation does (or doesn't) do with that size.
>
> So it appears that you are saying the implementation is required to keep
> track of whether Resize has been called, and to store the size passed to
> Resize. That doesn't seem like a very useful requirement to me.
Yup. That's precisely how Type'Size works in Ada; it has a fairly weak
effect on Obj'Size, but in any case, if you set it, you have to return the
same value (even if that value has nothing to do with how objects are
actually stored).
...
> Resize is an appropriate name for the operation as specified. I expect
> an operation named Resize to cause resizing. If we're really talking
> about giving the implementation a hint about an appropriate size, then
> not only does the specification need to be changed, the name also needs
> to be different (perhaps Size_Hint?).
I don't see a strong need to change the name, but I do agree with you that
there shouldn't be a *requirement* to do some allocation.
...
> > No, a stringspace implementation would be much better than
> > Unbounded_String for storing large numbers of strings of unknown
> > length. That's precisely the idea of this component (and the reason
> > it exists separately). Unbounded_Strings require many tiny
> > allocations, while a stringspace implementation requires just one (or
> > a few) larger ones.
>
> In general, a key is added to a map only once, and never modified. Using
> Unbounded_String would, therefore, only need one allocation per key, so
> I don't see that many tiny allocations are needed. However, you probably
> know more about this sort of thing, since compilers need to do this kind
> of thing a lot, so I may well be mistaken.
One allocation per key is a lot more than one allocation per *map*, which is
what a stringspace implementation takes. (Well, it might have to expand if
it gets full, but that should be rare. It could degrade to one allocation
per key if the keys are very, very long, but some care in implementation
should prevent degrading.)
> > Huh? What could you do with a separate hash table that you couldn't
> > do with a map? The hash "buckets" contain *something*, and that
> > something is (or can be) the same as the map elements.
>
> Suppose I want to store Integers in a hash table so I can determine if
> I've seen one before. There is no mapping from Integers to anything
> else. Yes, I can do that with a map, by providing a dummy type for the
> element type, and a dummy value for the element parameters, but that's
> an ugly kludge. Defining a map in terms of a hash table is neither ugly
> nor a kludge.
I have a component like that (it's actually Tom Moran's), but in practice,
I've *never* used it without using the index values it provides to manage
some other data in a separate table (at least statistics and/or debugging).
Even the 'known words' list in the spam filter uses the indexes (handles)
for debugging. If that's the case, why bother having to use a separate
component (causing another chance of error)?
So I would guess that the "dummy type" would gain some real data in 95% of
the applications. And that such uses are less than 10% of the uses of a map
anyway. Since this is a minimal library, we're not trying to cover that
remaining 0.5%.
****************************************************************
From: Randy Brukardt
Sent: Monday, February 9, 2004 7:41 PM
Matt Heaney said:
...
> As I mentioned in my previous message, Resize specifies a hint about the
> future number of elements in --that is, the length of-- the container.
> My assumption is that no container will ever have more than Integer'Last
> number of elements.
Ada only requires that Integer'Last is 2**15-1. That's 32767. Do you want to
assume that no container every has more than 32767 elements??
> If that assumption is incorrect, then maybe the container can be allowed
> to grow internally to more than Integer'Last number of elements, but can
> only report a maximum value of Integer'Last.
>
> Subtype Natural is the correct choice for the vector Resize operation.
>
> I think the ARG wants to use Hash_Type for Resize for the maps. My
> reference implementation still uses Natural.
Wow! I've been promoted to be the entire ARG! :-)
No, I think we should use a purpose-built type for this, just like we did
for hashing (and for the same reasons). I hope we don't repeat the mistake
of Ada.Strings.Unbounded (which, at least has a justification for making
that mistake).
****************************************************************
From: Matthew Heaney
Sent: Tuesday, February 10, 2004 9:19 AM
> Ada only requires that Integer'Last is 2**15-1. That's 32767. Do you want to
> assume that no container every has more than 32767 elements??
I assumed that type Integer corresponded to the "natural" word size of
the machine, and that if Integer were only 16 bits that this portended
other, more invasive resource issues, which precluded very large numbers
of container elements.
But it just goes to show you I don't know very much...
> Wow! I've been promoted to be the entire ARG! :-)
Sorry about that, I should have said "ARG select committee on
containers" but laziness got the better of me. I'll try to more clear
in the future.
> No, I think we should use a purpose-built type for this, just like we did
> for hashing (and for the same reasons). I hope we don't repeat the mistake
> of Ada.Strings.Unbounded (which, at least has a justification for making
> that mistake).
OK. But it would be nice if the operators of the length/count/size type
were directly visible at the point where the container instance is
declared, without having to with Ada.Containers too.
****************************************************************
From: Randy Brukardt
Sent: Wednesday, February 11, 2004 9:53 PM
Matt Heaney wrote:
> Randy Brukardt wrote:
>
> > Ada only requires that Integer'Last is 2**15-1. That's 32767. Do you want to
> > assume that no container every has more than 32767 elements??
>
> I assumed that type Integer corresponded to the "natural" word size of
> the machine, and that if Integer were only 16 bits that this portended
> other, more invasive resource issues, which precluded very large numbers
> of container elements.
Never assume about the Standard. :-)
Janus/Ada made the choice of leaving Integer at 16-bits to ease porting of our
many 16-bit customers to our 32-bit compilers. That probably was a bad choice
(because it harms portability of other Ada code to Janus/Ada), but in any case
we're pretty much stuck with it. (Changing would break too much existing code
and especially files.)
3.5.4(21) is the only requirement on the range of Integer; there isn't anything
else, not even Implementation Advice, about going further. If you want
something specific, declare your own.
> > Wow! I've been promoted to be the entire ARG! :-)
>
> Sorry about that, I should have said "ARG select committee on
> containers" but laziness got the better of me. I'll try to more clear
> in the future.
No, this idea was one that I idly mentioned (and dismissed) a couple of days
ago. I'm pretty sure no one else has talked about it (in either direction).
****************************************************************
From: Jeff Cousins
Sent: Tuesday, February 10, 2004 9:45 AM
Given that the Booch components are now available for free from AdaPower, is
there a pressing need for other containers?
Though having said that, we paid for the Booch components but only found
list_single_bounded_managed, list_utilities_single, heap_sort and quick_sort
to be of much use.
****************************************************************
From: Ehud Lamm
Sent: Tuesday, February 10, 2004 12:44 AM
> If you want to store elements of type T'Class, that you have to use an
> access type to instantiate the component, and then do the memory
> management of elements yourself.
>
> This is how it should be.
I agree with Matt on this one. Especially as regard 'class.
However, I think that strings should be treated as a speciall case. It seems
to me that the easiest approach is to provide a special version of the
packages for this case (a wrapper), which accepts string parameters (and
return type from functions), and uses unbounded internally. This wrapper can
be implemented on top of the basic library (instantiate with unbounded, and
let the wrapper routines simply do the string<->unbounded string
conversions).
One of the good things about having a standard library is that the
restricted component I described and others like it are going to be easy to
create, and share, seeing as they are based on packages all Ada users are
likely to have available. It is not mandatory they themselves be part of the
standard (though in this case I think it would be a valuable addition).
****************************************************************
From: Ehud Lamm
Sent: Tuesday, February 10, 2004 12:52 AM
> The most important point in a container library is *completeness* I would
> say. This is exactly what STL has done.
This is a good point, and keep in mind that I firmly belong to the 80/20 camp.
The reason why this point is well taken is that noone is likely to want to
use 2 (or 3) different contianer libraries inside one application. So the
feature rich library is likely to win over restricted (even standard) ones.
At least when building the _second_ application using such a library...
Howver, I don't think this means adding more stuff to Ada.Containers at this
point. Let's be practical here. Time is short etc. etc.
What should be done, however, is for the community to provide more
components based on the same style (and based on the simple building blocks
that are part of the stadnard lib). Some of these will be adopted into the
core later on, and some will simply coexist nicely with the standard lib
while remaining independent.
****************************************************************
From: Martin Krischik
Sent: Tuesday, February 10, 2004 2:07 PM
Am Montag, 9. Februar 2004 23:28 schrieb Robert A Duff:
> Regarding support for indefinite keys,
>
> Martin Krischik said:
> > But you could not even strore a collection of strings. Ok, there are
> > unbounded strings. But storing 'Class thats the killer feature. If
> > Ada.Containers can't do it I am not interested. The will be no 20%/80%
> > split. Its 0% - I won't us them.
>
> How about this: you write a package that supports the indefinite case,
> and you build it on top of the (currently proposed) standard package
> that supports only definite?
Did that allready - but it based on the booch components.
> The point is, you *can* use the definite-only package, but only
> indirectly, via a wrapper package. The definite-only package isn't
> useless; it does *part* of the job you desire. This seems like a better
> design than making a single package that supports both, and somehow
> magically optimize the definite cases.
Agreed, two packages are betten the one. And currently I do the same with the
booch componentes - only I create one from the other with the help of an text
filter instead of using a wrapper.
> If the RM supports indefinite, I claim it should do so by providing two
> separate packages. But we're trying to minimize the size of all this,
> so we choose just the lower-level one of those.
Maybe the RM should suggest names for extended containers.
****************************************************************
From: Martin Krischik
Sent: Tuesday, February 10, 2004 2:16 PM
Am Montag, 9. Februar 2004 19:52 schrieb Matthew Heaney:
> The library is designed around the common case, which means definite key
> and element types.
>
> If you want to store elements of type T'Class, that you have to use an
> access type to instantiate the component, and then do the memory
> management of elements yourself.
>
> This is how it should be.
If a garbage collector was provided as well: Yes. Otherwise NO!!
There is something wich upsets me great time:
Half the Ada community says: No garbage collector please! - The container
library should do memory managment.
The other half says: No, container libraries should not provide memory
managment.
It would be better for Ada if the Ada comunity make there mind up.
Well since in AdaCL I have both I made my mind up: container libraries with
memory management is more usefull.
****************************************************************
From: Marius Amado Alves
Sent: Tuesday, February 10, 2004 2:55 PM
I just wrote the thing excerpted below.
The whole is available at
http://www.liacc.up.pt/~maa/containers
Thanks.
-- TRUC : TRUE CONTAINERS
-- by Marius Amado Alves
--
-- Truc is a proof-of-concept implementation of AI-302/3
-- for indefinite elements, i.e. indefinite generic formal
-- element types (reorder the 4 adjectives at will).
--
-- Truc automatically chooses the appropriate implementation
-- for the actual type. Definite actuals select a Charles-like
-- body, whereas indefinite ones select a SCOPE-like one.
--
-- Truc is 100% written in Ada. Some optimizations could be
-- done by going a bit outside the language. This is
-- discussed elsewhere.
--
-- Only the vector variety is implemented.
-- Only a subset of the interface is implemented.
****************************************************************
From: Randy Brukardt
Sent: Tuesday, February 10, 2004 6:25 PM
I'm going back and filing all of these messages about this AI, and I'm
continually seeing statements like:
"If the containers don't have <my pet feature>, I'm not going to use them."
I realize hyperbole as common on mailing lists, but you have to keep in mind
the current situation.
In order to meet the schedule, the ARG needs to complete proposals by the
end of the June meeting, or they're not going to be in the standard. That
reality means that there is not time to develop a significantly different
proposal. (Wordsmithing is different; I expect there to be plenty of
wordsmithing done on this proposal. I certainly hope that some of the
problems noted by Jeff Carter (for instance) are fixed.)
The strategy proposed by the committee was to standardize something like
AI-302-3, and encourage the development of a secondary standard (at a more
leisurely pace!) to handle creating additional containers to provide
additional functionality, both performance related (bounded forms, lists,
etc.), functional (sorted_maps, unsorted_sets, etc.), and operational
(indefinite keys, indefinite elements, limited elements). We hope that
providing a standard root will channel future developments in a common
direction, rather than the scattershot approach that's currently prevalent.
The ARG is going to have to decide either to follow that strategy, or
essentially give up (because there is no time to develop an alternative).
Now, when you say "I won't use it.", you're putting the ARG members into a
spot:
1) Either the ARG has to standardize over the objections of users, "because
we know better"; or
2) Decide that there is insufficient consensus, and forget the proposal.
My feeling about the brief discussion at the San Diego meeting is that some
ARG members view this as an insolvable problem, and would just as soon
forget it (tossing it to some undefined International Workshop Agreement
process). It took a lot of persuading by Tucker (and to a lesser extent, by
me and couple of others) to set up the committee rather than just tossing it
at that meeting.
I fully expect to revisit that at our next meeting. If the discussion here
gives the opponents too much ammunition, there probably won't be a standard
container library in Ada now (and I personally think *ever*).
If that is your true opinion, feel free to express it - I'd rather spend my
time working on something that will likely be in the standard in that case!
But otherwise, I'd suggest cutting down the hyperbole in your messages.
****************************************************************
From: Marius Anado Alves
Sent: Wednesday, February 11, 2004 4:03 AM
They're not hyperboles. Please don't paternize. The wanted features
missing in the proposal had been expressed since long ago, even
formally, and repeatedly till now, and probably people saw this
discussion as a last change to win the "resistance".
It is clear now we must abandon all hope. Thanks for making that clear
at last.
So it's an incomplete library or none at all. I only fear an incomplete
standard can do more harm than good, principally in respect to
attracting new programmers to the language--by creating a bad first
impression. (For what it's worth, I'd say toss it. Those Front and Back
things looked terrible anyway.)
****************************************************************
From: Pascal Obry
Sent: Wednesday, February 11, 2004 4:25 AM
> So it's an incomplete library or none at all. I only fear an incomplete
> standard can do more harm than good, principally in respect to
> attracting new programmers to the language--by creating a bad first
> impression. (For what it's worth, I'd say toss it. Those Front and Back
> things looked terrible anyway.)
I tend to agree with Marius here. Especially if the next change is for 5 or
10 years from now ! A good programming language needs to provides a decent
container library today. As I have already said, if the library is not broad
enough it will just not be used, Charles, PragmArc or the Booch components
will be used instead... In this case it is not even necessary to add a set
of standard containers...
Just my 2 cents of course.
****************************************************************
From: Marc A. Criley
Sent: Wednesday, February 11, 2004 7:57 AM
It looks like the frequent situation of no lack of devil's advocates (who
are only trying to make things better), and too few championing angels :-)
The Ada software I develop can be split into two broad categories:
performance critical, and non performance critical.
When writing the latter, NIH (Not-Invented-Here) is a dirty word to me. I
want to write software fast and right, and I'll happily reuse standard
components, my own stuff that I've got laying around, and whatever utilities
and libraries have been posted on the Internet for free use.
For example, while Unbounded_Strings gets a lot of abuse in Ada discussions,
it and standard strings have pretty much provided all of the string
processing I've ever needed.
I fully expect as well that I'll be extensively employing Ada.Containers
just as soon as they're standardized.
I don't care if there are some purported conceptual weaknesses or omissions,
or if the implementation could be improved--if it provides the functionality
I need, is effectively bug-free, performs "good enough", and better yet is
part of the Ada standard, it gets used.
I don't want to write infrastructure if there's already packages that
provide it. I don't want to have to select among the pros and cons of six
different home-grown container package collections and then have to concern
myself with whether the developer is going to maintain them, or if I have to
take on the responsibility for that as well. (I've never concerned myself
with who's maintaing the Ada.Strings hierarchy, but I do now have to
maintain my own version of a particular container collection.)
I want Ada.Containers, and I will use them. Make them as good and powerful
as you can, and then shut off the discussion and release them.
****************************************************************
From: Martin Dowie
Sent: Wednesday, February 11, 2004 8:22 AM
> I want Ada.Containers, and I will use them. Make them as good and powerful
> as you can, and then shut off the discussion and release them.
I'd second that.
What is currently proposed is admittedly limited but it
would be useful. If Matt could adapt "Charles" into the
core of the secondary standard that would be great too.
****************************************************************
From: Matthew Heaney
Sent: Wednesday, February 11, 2004 9:36 AM
That is indeed the plan. The current proposal has only a modest set of
containers but we have to start somewhere.
If you need something right away, there is a reference implementation
available at my home page.
<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040211.zip>
****************************************************************
From: Matthew Heaney
Sent: Wednesday, February 11, 2004 9:44 AM
I think you'll find that in spite of its modest size, the containers in
the current proposal are indeed very, very useful.
See in particular the !examples section in the AI itself.
<http://www.ada-auth.org/cgi-bin/cvsweb.cgi/AIs/AI-20302.TXT?rev=1.1>
The reference implementation contains several examples, too.
****************************************************************
From: Martin Krischik
Sent: Wednesday, February 11, 2004 12:38 PM
> The strategy proposed by the committee was to standardize something like
> AI-302-3, and encourage the development of a secondary standard (at a more
> leisurely pace!) to handle creating additional containers to provide
> additional functionality, both performance related (bounded forms, lists,
> etc.), functional (sorted_maps, unsorted_sets, etc.), and operational
> (indefinite keys, indefinite elements, limited elements). We hope that
> providing a standard root will channel future developments in a common
> direction, rather than the scattershot approach that's currently prevalent.
Ok, you are right there. I can easiely live with "indefinite later" - to name
my pet feature - however some expressed an "indefinite never" stand and I
can't live with that.
****************************************************************
From: Marc A. Criley
Sent: Wednesday, February 11, 2004 2:23 PM
I fear the participants on this list are rather detached from the "average
Ada programmer" experience.
Of all the dozens of Ada programming _coworkers_ I've worked with over the
years, I could count on one hand (and not even need all the fingers) the
number that would know or care what Charles or PragmArc are (much less
something called an "ARG"), and those few who'd heard of Booch would recall
it as just something that had been used in the early days.
Where are the journeyman programmers for whom Ada is just the language they
write code in going to find data structures? If it doesn't show up in the
reference manual, it'll be borrowed from some home- or project-grown thing
that was done before, get ginned up yet again from scratch, or maybe get
copied out of a dog-eared Ada textbook.
Meanwhile, the C++ programmers have got the STL handed to them on a platter,
and the Java programmers have got their big JDK posters and Javadocs with
all those containers documented and ready to use.
But for the Ada programmer that just clocks in, codes, and goes home to
their family, nothing.
****************************************************************
From: Pascal Obry
Sent: Wednesday, February 11, 2004 2:59 PM
> I fear the participants on this list are rather detached from the "average
> Ada programmer" experience.
This is not about average something or not. Just that I'm using Ada in the
Information Technology domain. I don't really care(1) about size or
performances, this is not hard real-time nor embedded applications. In the IS
field we need a decent container libraries to speed-up developement. What I'm
saying is that if the container library is not complete I'll use something
else. And since people on the embedded or real-time field are certainly not
going to use the standard containers but most probably some simpler version
hand-coded for the application I'm a bit concerned about the current path...
Ada is *not only* an embbeded real-time programming language!
Pascal.
(1) I did not say that I want quick and dirty code :)
(2) BTW, I'm not sure to be an average Ada programmer :)
****************************************************************
From: Robert A. Duff
Sent: Wednesday, February 11, 2004 3:06 PM
> Ok, you are right there. I can easiely live with "indefinite later" -
> to name my pet feature - however some expressed an "indefinite never"
> stand and I can't live with that.
I don't remember anybody saying "indefinite never", but anyway, *my*
opinion is that a secondary standard containing a rich variety of stuff,
including all the bells and whistles that various folks have asked for,
including indefinite component types, would be a Good Thing.
But somebody has to take charge and push such a secondary standard
through. I'm not volunteering. ;-)
****************************************************************
From: Jeffrey Carter
Sent: Wednesday, February 11, 2004 6:02 PM
The only problem is that there doesn't seem to be any mechanism for such
a secondary standard. Indeed, the intial call for proposals for the
standard container library indicated that it was for a secondary
standard, but it is intended to become part of the ARM now.
****************************************************************
From: Randy Brukardt
Sent: Wednesday, February 11, 2004 10:42 PM
The intent is to use a new ISO procedure called an "International Workshop
Agreement". These get published essentially immediately (no lengthy approvals),
and then can later be turned into real standards if that proves to be a good
idea.
But, as Bob mentioned, there have to be people to drive that "Workshop"
(which doesn't need to be an actual workshop per-se).
****************************************************************
From: Jean-Pierre Rosen
Sent: Thursday, February 12, 2004 2:57 AM
There is such a mechanism: it is called an International Workshop Agreement. It
is a relatively new ISO procedure, giving official status (though not formally
Standard) to a specification for which there is consensus. Such an IWA may
become later a full-fledged standard.
Since it is new, nobody really knows how this works, and whether vendors would
feel bound to providing packages defined by IWA. But the mechanism is here.
****************************************************************
From: Jeffrey Carter
Sent: Thursday, February 12, 2004 12:49 PM
OK. How do we get such a "workshop" set up? I hope it's obvious that I'm
willing to participate.
****************************************************************
From: Jean-Pierre Rosen
Sent: Friday, February 13, 2004 4:44 AM
It is an ISO process, therefore you should get in touch with Jim Moore.
Of course, the first thing is to have a conveynor. If you step forward...
****************************************************************
From: Matthew Heaney
Sent: Wednesday, February 11, 2004 9:02 AM
As an example of the approach Bob is advocating here, I have included
two examples in the latest reference implementation.
<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040211.zip>
The two new examples are for a vector of indefinite elements and a set
of indefinite elements.
Both were implemented as a thin layer on top of the vector and set
containers provided by the library itself.
Neither one took very long to write (in fact I did it while watching an
episode of The Simpsons).
In the indefinite set example, I use the nested generic package
Generic_Keys, and its nested generic package Generic_Insertion.
In the indefinite vector example, I use the library-level Generic_Sort
generic algorithm.
In the test code, I instantiate each component with type String (an
indefinite type).
Note that if you want to instantiate the component with a class-wide
tagged type T'Class, then you'll probably have to declare these
class-wide operations somewhere:
procedure Is_Equal (L, R : in T'Class) is
begin
return L = R;
end;
procedure Is_Less (L, R : in T'Class) is
begin
return L < R;
end;
and then use these as the generic actuals for "<" and "=".
****************************************************************
From: Matthew Heaney
Sent: Wednesday, February 11, 2004 8:59 AM
> But Ada hasn't got a garbage collector so there is the deallocation problem.
> Especialy when the container copied or passed around.
The latest version of the reference implementation has examples of a
vector of indefinite elements and a set of indefinite elements.
Internally both instantiate the underlying container with a simple
controlled type that manages an access object that designates element
type of the higher-level container.
See the Insert_N and Replace_Element operations in the indefinite vector
package for a brief discussion of the various tradeoffs involved. See
also Generic_Sort2.
See also the indefinite sets package for an example of how to use the
Generic_Keys nested generic package.
****************************************************************
From: Matthew Heaney
Sent: Wednesday, February 11, 2004 9:24 AM
> -- Truc is a proof-of-concept implementation of AI-302/3
> -- for indefinite elements, i.e. indefinite generic formal
> -- element types (reorder the 4 adjectives at will).
The latest version of the reference implementation has two new examples:
one for a vector of indefinite elements and another for a set of
indefinite elements.
There is no "automatic" selection of a package. The programmer chooses
the correct package himself, at the time of instantiation.
If he needs to store indefinite elements, then he instantiates the
package for indefinite elements.
If his element type is definite, then he has the choice of using either
the definite or indefinite packages. The package for definite elements
will be more efficient, of course.
****************************************************************
From: Marius Amado Alves
Sent: Wednesday, February 11, 2004 9:55 AM
> There is no "automatic" selection of a package. The programmer chooses
> the correct package himself, at the time of instantiation.
I know. I saw your code. It's fine. So one last try: how about configuring
these indefinite elements versions as the one-page specialized needs annex
below? Matt's manual choice approach has the virtue of fitting right in.
ANNEX <IE>
Containers of Indefinite Elements
This Annex provides support for containers of indefinite elements.
[Implementation Requirements]
An implementation conforming to this Annex shall have the package
Ada.Indefinite_Elements and descendants defined in this Annex.
[Static Semantics]
The specifications of the descendants of Ada.Indefinite_Elements are a copy of
the specifications of the descendants of Ada.Containers specified in A.17,
with the unique difference that, for each generic descendant of
Ada.Containers that has a definite element formal type, the corresponding
descendant of Ada.Indefinite_Elements has an indefinite formal type in its
place.
[Dynamic Semantics]
The behaviour associated with each container of Ada.Indefinite_Elements is
exactly like that defined in A.17 for the corresponding container of
Ada.Containers.
[Examples]
Specification of Ada.Indefinite_Elements.Vectors:
generic
type Index_Type is (<>);
type Element_Type (<>) is private;
with function "=" (L, R : Element_Type) return Boolean is <>;
package Ada.Indefinite_Elements.Vectors
-- remainder of this package exactly like that of
-- Ada.Containers.Vectors
****************************************************************
From: Robert A. Duff
Sent: Wednesday, February 11, 2004 11:02 AM
> I just wrote the thing excerpted below.
> The whole is available at
> http://www.liacc.up.pt/~maa/containers
> Thanks.
It seems inefficient to store *two* vectors for each vector,
and to select between them at run time, when 'Definite is generally
known at compile time. Why not let the programmer choose to instantiate
one or the other package?
Also, this code uses 'Unrestricted_Access, which is not Ada.
****************************************************************
From: Marius Amado Alves
Sent: Wednesday, February 11, 2004 9:22 AM
> The latest version of the reference implementation has examples of
> indefinite vectors and indefinite sets, both of which can be used to
> instantiate elements of type T'Class.
Good news!
BTW, Truc (www.liacc.up.pt/~maa/containers/truc.ada) has been updated also,
with:
- a test for classwide element types too (passed:-)
- cosmetics
****************************************************************
From: Marius Amando Alves
Sent: Wednesday, February 11, 2004 12:51 PM
> It seems inefficient to store *two* vectors for each vector,
> and to select between them at run time, when 'Definite is generally
> known at compile time.
As I say in the Truc spec, going outside the language would make it optimized.
I can think of a number of ways to do so, and eliminate those ineficiencies.
Aside.
The two vectors problem could perhaps be eliminated inside the language using
tagged types (it would still be dynamic dispatching though, i.e. a runtime
choice). I tried that but Ada got in my way and so I solved the problem
quickly and dirtly.
Anyway it is not a big inneficiency in practice because only one vector is
used, and the other never used not even initialized vector has neglectable
space and zero time impact.
End of aside.
> Why not let the programmer choose to instantiate
> one or the other package?
Staying within Ada, yes, that is better, and supports my suggestion to put the
indefinite variants in a separate package branch defined in a specialized
needs annex. And using the already existing reference implementations by Matt
(released today).
> Also, this code uses 'Unrestricted_Access, which is not Ada.
You've got me, I'll have to change the 100% Ada claim to 99% :-)
I used 'Unrestricted_Access instead of the Rosen trick because element types
must be non-limited. Maybe there's another way, but I couldn't think of it.
Aside.
I used AI302.vectors of stream elements to avoid doing memory management, in
one more experiment in pointerless programming. And for other reasons. For
example programming for persistency: I can easily get a persistent container
just by changing the stream operations.
I needed write access to an in container because of this stream approach. Now
I'm curious if Matt's implementation has this and how he did it.
End of aside.
****************************************************************
From: Robert A. Duff
Sent: Wednesday, February 11, 2004 3:16 PM
Marius Amado Alves wrote:
> I know. I saw your code. It's fine. So one last try: how about configuring
> these indefinite elements versions as the one-page specialized needs annex
> below? Matt's manual choice approach has the virtue of fitting right in.
I like this idea, but I don't think it should be in a Specialized Needs
Annex (i.e. optional for implementers to support it). The problem is
not that it's hard to support, but that it adds extra verbiage to the
RM. You've shown, I think, that the extra verbiage could be pretty
small.
We compiler writers can probably even get Matt to code up the
implementation for us. ;-)
This idea is much better than a magic package that supports both
definite and indefinite efficiently.
****************************************************************
From: Robert A. Duff
Sent: Wednesday, February 11, 2004 3:35 PM
True.
However, I think going outside the language is a bad idea.
I say: An efficient implementation should be possible in pure Ada.
As an implementer, I have no intention of adding compiler magic for this
stuff -- I want to be able to just write pure Ada code (or, better yet,
take advantage of Matt's work).
Even Address_To_Access_Conversions makes me nervous -- yeah, it's Ada,
but it's rather ill-specified.
> I can think of a number of ways to do so, and eliminate those ineficiencies.
>
> Aside.
> The two vectors problem could perhaps be eliminated inside the language using
> tagged types (it would still be dynamic dispatching though, i.e. a runtime
> choice). I tried that but Ada got in my way and so I solved the problem
> quickly and dirtly.
>
> Anyway it is not a big inneficiency in practice because only one vector is
> used, and the other never used not even initialized vector has neglectable
> space and zero time impact.
> End of aside.
I don't want users of definite types to pay *any* penalty caused by
supporting indefinite types.
> > Why not let the programmer choose to instantiate
> > one or the other package?
>
> Staying within Ada, yes, that is better, and supports my suggestion to
> put the indefinite variants in a separate package branch defined in a
> specialized needs annex. And using the already existing reference
> implementations by Matt (released today).
As I said in my previous message, that suggestion seems reasonable,
except for the cialized needs annex" part. For portability, we
don't need more optionally-supported features of Ada.
On the other hand, maybe support for indefinite is just too much
(for the Ada RM -- of course a secondary standard should support
all bells and whistles).
> > Also, this code uses 'Unrestricted_Access, which is not Ada.
>
> You've got me, I'll have to change the 100% Ada claim to 99% :-)
OK, but 99% isn't good enough. I want these packages to implementable
in 100% pure Ada. If that's not possible (as in the defaulted-discrims
case somebody mentioned) we need to change the language to *make* it
possible.
> I used 'Unrestricted_Access instead of the Rosen trick because element
> types must be non-limited. Maybe there's another way, but I couldn't
> think of it.
Well, I can think of ways involving "for X'Address use.." or
Address_To_Access_Conversions, and I might be willing to live with that,
but I don't like it.
I didn't read your code carefully enough to understand whether
'Unrestricted_Access was really needed. Why not declare the thing
aliased, and use 'Unchecked_Access?
> Aside.
> I used AI302.vectors of stream elements to avoid doing memory management, in
> one more experiment in pointerless programming. And for other reasons. For
> example programming for persistency: I can easily get a persistent container
> just by changing the stream operations.
>
> I needed write access to an in container because of this stream
> approach. Now I'm curious if Matt's implementation has this and how he
> did it. End of aside.
Is it not possible to allocate the indefinite thing in the heap,
and still know when it needs to be freed? I don't like memory leaks...
****************************************************************
From: Randy Brukardt
Sent: Wednesday, February 11, 2004 4:24 PM
> I like this idea, but I don't think it should be in a Specialized Needs
> Annex (i.e. optional for implementers to support it). The problem is
> not that it's hard to support, but that it adds extra verbiage to the
> RM. You've shown, I think, that the extra verbiage could be pretty
> small.
I actually was going to suggest the same thing, given that the wording
needed is roughly the same supporting a "wide_string" version of something
given a "string" version.
I'll put it as an Open Issue in the "bug fix" update of the AI. (I don't
want to make major changes to the AI, because I don't want to present a
moving target to the ARG members who are supposed to be studying it for the
upcoming meeting...)
****************************************************************
From: Matthew Heaney
Sent: Wednesday, February 11, 2004 4:57 PM
I have several places in the reference implementation that I've notated
with "NOTE", places in the AI where the semantics aren't exactly
specified, where there's disagreement, where there can be improvement,
etc. Should I send you a list or something? When would you like me to
do that?
****************************************************************
From: Randy Brukardt
Sent: Wednesday, February 11, 2004 5:21 PM
Sure, do that. Any time is fine, but no later than the start of next week.
****************************************************************
From: Jeffrey Carter
Sent: Wednesday, February 11, 2004 5:56 PM
>>The specifications of the descendants of Ada.Indefinite_Elements are a copy of
>>the specifications of the descendants of Ada.Containers specified in A.17,
>>with the unique difference that, for each generic descendant of
>>Ada.Containers that has a definite element formal type, the corresponding
>>descendant of Ada.Indefinite_Elements has an indefinite formal type in its
>>place.
This doesn't seem quite right. The containers all have an Element_Type
formal, so it can specify that type. Maps have a Key_Type that should
also be indefinite, so it should specify it as well.
However, this seems like a good way to add support for indefinite
elements to the proposal. If only adding additional containers could be
this easy!
****************************************************************
From: Marius Amado Alves
Sent: Wednesday, February 11, 2004 5:46 PM
>I'll put it [Annex <IE>] as an Open Issue in the "bug fix" update of the AI.
Great!
Annex <IE> is all it takes to make the proposal "complete". It is
already fairly complete with respect to structural varieties (vector,
set, map). What it is really missing is element type varieties
(definite, indefinite). The group (definite, indefinite) has *exactly*
the same properties as (vector, set, map). Primitive (in the good sense
of course), concise, complete, useful. (Definite, indefinite) as opposed
to (definite, indefinite, tagged, limited, abstract...), like (vectors,
set, map) vs. (vector, set, map, queue, list...), these two oppositions
are in perfect alignment. The extra things in each latter group can be
realised with the ones in the former. The proposal is complete only if
it is complete along at least these two axes (structural variety,
element type). The other axes--size, persistence--are of lesser impact.
It does not offend me at all to have them set to a fixed point in the
standard--unbounded, core memory--, and extend them in secondary
standards--(fixed, bounded, unbounded...), (core, cache, file...) A nice
simetry. Container space has 4 axis (structure, element type, size,
persistence). Aternative 3 with Annex <IE> ranges over 2, and fixes a
point in the other 2. The ranges and points defining the most primitive
region. The standard region. I promise this is my last motivational
rambling for indefinite elements. I needed to have a view of the whole,
evidently I used my "system of coordinates", and I thought I might share
it with you.
Talking of Open Issues: the range vs. discrete index issue. I'd say
range. It solves the problem of Assert failing on enumerations. And the
use of an enumeration for the index of a *variable* length vector does
not make much sense. Ditto for modular types.
****************************************************************
From: Robert A. Duff
Sent: Wednesday, February 11, 2004 3:40 PM
One thing that disappoints me about the current containers proposal is
that there's no way to control memory allocation. The C++ STL allows
the client to define which storage pool should be used. Would it be
possible for us to do the same, without burdening users who just want to
use "the regular heap"?
****************************************************************
From: Randy Brukardt
Sent: Wednesday, February 11, 2004 4:17 AM
I don't think so. There were proposals offered for naming the standard
storage pool(s) and allowing defaults for generic formal parameters, but
both of those died an early death. (See AI-299 and AI-300.) Those were aimed
at solving this problem. Since we're not reintroducing existing, killed
proposals (and certainly the need for them in containers libraries was well
explained when first considered - there's no "new information" here), it
would have to be done without them.
That means that about the only way to do it would be with an access type
(with "null" meaning use the default pool). That seems very ugly to me,
especially as you would have to make the pool that you want to pass in
"aliased". And there doesn't seem to be a good place for that access type to
live.
In any case, that strikes me as creeping featurism.
****************************************************************
From: Matthew Heaney
Sent: Wednesday, February 11, 2004 4:34 PM
You could do something like this:
generic
type Element_Type is private;
Pool : in out Root_Storage_Pool'Class;
with function "=" (L, R : ET) return Bool is <>;
package Ada.Container.Vectors is ...; //for ex.
There are several problems:
(1) The language standard doesn't specify any storage pool objects. I
suppose that the standard library could define a few default pool
objects, though.
(2) Even if you do have a pool then you run into problems with static
matching rules, since the generic formal pool type is T'Class, which
doesn't match a specific type NT in T'Class. So you have to resort to
hacks like:
package My_Pools is
My_Pool : My_Pool_Type; --derives from RSP
My_Pool_View : Root_Storage_Pool'Class
renames Root_Storage_Pool (My_Pool);
end;
and then use My_Pool_View as the generic actual pool object.
(3) It's in conflict with our design principle that components be easy
to instantiate and use. I would love to have a generic formal pool
object default a la "is <>" or "is <name>", but the language doesn't let
you specify defaults for generic formal objects.
(4) You might be able to get around (2) by declaring a generic formal
derived type:
generic
type ET is private;
type Pool_Type is new Root_Storage_Pool with private;
Pool : in out Pool_Type;
package Ada.Containers.Vectors is ...;
but then this is in conflict with (3), because now there's another
formal type (which cannot be defaulted).
In C++ generic formal pool objects ("allocators") are allowed to have a
default, by constructing an allocator on-the-fly. But then there's some
rule about the STL that requires allocator objects be shared (or
something like that)??? And then it complicates things for implementors
because you have to use the "empty virtual base class" trick to avoid
allocating padding for otherwise empty classes.
Realize that adding custom allocator support to the STL complicated the
semantics somewhat (do objects have the same or different allocators? --
affects assignment rules, etc).
An early version of Charles allowed you to pass in a storage pool, but I
eventually gave it up because it was too many headaches for casual users
who didn't care about supplying their own pool.
If you've studied my reference implementation then you might have
noticed that the substrate package (e.g. charles.red_black_trees) used
to implement the higher-level container is written so that all the
allocation and deallocation is done by the higher-level package. The
substrate package is completely agnostic about how storage allocation
gets done. This allows the user of the instantiation of the red-black
tree (say) to use a pool if he wants, or indeed even statically allocate
the nodes. In fact the container elements can even be limited. The
substrate package doesn't care. All the ugliness is hidden from the
container user by the wrapper container package.
So I reached the conclusion that if someone (like, um, Bob Duff, who has
written lots of custom storage pools) needs a special sorted set that
uses some fancy storage pool, then it's not too hard to do that using
the substrate package directly and building his own wrapper class.
Perhaps there is a way to do this. It may be that there's some slick
language trick that I haven't figured out that would allow the user to
pass in his own pool without too much pain at instantiation time.
There is also the multi-threading issue. Clearly the user has the
responsibility to not allow concurrent access to the same container
object, but what about different threads each manipulating their own
container object, so we have multiple container objects (and hence
multiple threads) sharing a common pool object? But I suppose you could
use the same synchronization mechanism you use for alligator new.
Maybe you could make some other (non-limited) abstraction, and pass that
in as the default, e.g.
generic
type ET is private;
Pool : Pool_Handle := Default_Pool; --from somewhere
package Generic_Containers is ...;
but the language doesn't give you anything like the placement new
construct in C++, which allows you to construct an object in-place, at a
location you specify.
There is a sort of hack you can do by declaring a pool object
on-the-fly, that binds to an object (in some raw form e.g. storage
elements) to be constructed using an access discriminant. Then you make
a dummy call to new, and internally the pool object specifies the
address of the object (to which the pool object is bound) as the address
return value. The run-time system will then call Initialize on that
object. Placement new in Ada95! But that's kind of a trick and I don't
really know if it will work.
****************************************************************
From: Tucker Taft
Sent: Wednesday, February 11, 2004 9:31 AM
I believe the intent is that all of these containers
use controlled types to avoid storage leakage, analogous
to what unbounded strings do. (In fact, I could imagine
that a vector and an unbounded string would have a lot
in common under the covers.)
So I'm not sure how a user-defined pool would interact
with that (and I fear based on our own experience that
putting finalizable things in user-defined pools can be
tricky).
Note that this will give more incentive for implementors
to "sharpen up" their implementation of controlled types.
I think that is a good thing, so we are spending energy
improving existing features of the language, rather than
dissipating energy on lots of different ways of skinning
the same cat.
****************************************************************
From: Tucker Taft
Sent: Wednesday, February 11, 2004 5:13 PM
Matthew Heaney wrote:
> ...
> (3) It's in conflict with our design principle that components be easy
> to instantiate and use. I would love to have a generic formal pool
> object default a la "is <>" or "is <name>", but the language doesn't let
> you specify defaults for generic formal objects.
I generally agree with your reasoning, but this particular statement
is false, unless you say "formal IN OUT objects." Formal IN objects
certainly can have defaults, and the way to pass in a storage pool
would be as Randy suggested, via an access value. The default could
be implementation-defined, with the semantics that it implies the
standard storage pool, if allowed to default.
But I still believe my earlier response, that mixing user-defined
storage pools and controlled types is asking for complexity, and
doesn't seem to buy enough to justify itself.
One reason to have a user-defined storage pool is to do some kind
of garbage collection, or to do mark/release. Both of those could
easily interfere with the implementation of controlled
types, unless the user was very careful, and had a pretty good idea about
how controlled types were implemented.
****************************************************************
From: Simon J. Wright
Sent: Thursday, February 12, 2004 3:12 AM
Marc A. Criley wrote:
> Where are the journeyman programmers for whom Ada is just the
> language they write code in going to find data structures? If it
> doesn't show up in the reference manual, it'll be borrowed from some
> home- or project-grown thing that was done before, get ginned up yet
> again from scratch, or maybe get copied out of a dog-eared Ada
> textbook.
For a project of any size I would not expect journeyman programmers
to be making this sort of choice; it should be a matter of policy set
by the software architect(s), along with "how we use tasks", "how we
deal with exceptions" etc.
So the question is, where do the architects find stuff? and clearly
the ARM is a very good start (though I have to admit there are parts
of Annex A that I'm not at all familiar with and should be!
Strings.Maps, for example).
I started maintaining the BCs because I needed containers for a demo
project. Although it has been fun, I would never have done so if the
proposed library had been available, and I strongly support it.
****************************************************************
From: Marc A. Criley
Sent: Thursday, February 12, 2004 7:46 AM
Speaking as a software architect for both Ada and C++ projects, your
characterization of this aspect of the architect's job is quite correct.
One of my sub-tasks was ensuring that only the authorized container classes
were being used, even to the point of once having to threaten to reject a
developer's code if he didn't start using STL instead of coding up his own
comparable classes.
However, I've had plenty of experience with other architects and leads who
aren't on this mailing list, don't visit comp.lang.ada (or comp.lang.c++),
aren't on the Team-Ada mailing list, don't subscribe to any technial
magazines or journals, aren't in ACM, don't go home and code at night or on
weekends, and own only one book covering each programming language they have
to deal with, whether it be Ada, C++, Java, Perl, etc.
When they need container classes, they check the project's code base or they
go to their books, which for Ada does usually include the reference manual.
That's where container availability needs to be publicized, because if they
do find something on the Web (like Booch or Charles or PragmArc), they need
to first expend the effort to convince themselves that that "home-grown"
collection is something that would be useful, and then overcome most
management's resistance to using "free", unsupported software. Making
Ada.Containers part of the standard language distribution gives it a cachet
of legitimacy that means that architects/leads don't have to fight that
fight. And they end up with a container collection that will handle the
needs of most non performance critical projects.
****************************************************************
From: Stephen Leake
Sent: Thursday, February 12, 2004 11:48 AM
Just to state my position for the record:
I like AI-302-3.
I think the rationale for the design needs to be more clearly stated
(particularly why the key and element types are definite).
Examples of how to build packages supporting indefinite types would be
good; an actual standard for that (layered on top of the current one)
would be better, but I can wait for that.
I like the term "cursor" instead of "iterator"; "iterator" is clearly
overloaded, while "cursor" matches the usage in SQL.
Since these packages are intended to be low-level building blocks, I'd
rather see them called "unbounded_array", "hashed_map", and
"sorted_tree". But that's a small issue.
****************************************************************
From: Matthew Heaney
Sent: Thursday, February 12, 2004 3:41 PM
See the latest reference implementation (Thu, 12 Feb 2004) for examples
of using the canonical containers to implement indefinite vectors and
indefinite sets.
<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040212.zip>
I'll have at least one example of a map of indefinite elements tomorrow.
> I like the term "cursor" instead of "iterator"; "iterator" is clearly
> overloaded, while "cursor" matches the usage in SQL.
The Iterator design pattern described in the Gamma book says that
"Cursor" is an alias for the term "Iterator", so you seem to be in good
company.
> Since these packages are intended to be low-level building blocks, I'd
> rather see them called "unbounded_array", "hashed_map", and
> "sorted_tree". But that's a small issue.
Low-level is a point of view.
It's a vector implemented as an unbounded array, not an unbounded array
per se.
It's a map, implemented using a hash table, but not a hash table per se.
It's a sorted set, implemented using a balanced (red-black) tree, but
not a tree per se.
Yes, they're building blocks. Yes, they're low level. But they're not
as low-level as unbounded arrays, hash tables, and red-black trees.
****************************************************************
From: Stephen Leake
Sent: Friday, February 13, 2004 8:19 AM
> > "sorted_tree". But that's a small issue.
>
> Low-level is a point of view.
>
> It's a vector implemented as an unbounded array, not an unbounded
> array per se.
Hm. Let's compare Ada.Containers.Vectors to SAL.Poly.Unbounded_Array.
Vectors has Insert in the middle, Sort, and Element_Access. SAL allows
indefinite and limited items. Otherwise they are the same.
Sort is a reasonable operation for any container; I would put it in a
child package, since many applications won't need it.
I guess that means SAL.Poly.Unbounded_Array is actually a "vector"?
What would a true low-level unbounded_array look like?
> It's a map, implemented using a hash table, but not a hash table per se.
I need to see your definition of "hash table"; this looks like one to me.
> It's a sorted set, implemented using a balanced (red-black) tree, but
> not a tree per se.
This one I'll grant you; it is more complex than just a tree.
> Yes, they're building blocks. Yes, they're low level. But they're
> not as low-level as unbounded arrays, hash tables, and red-black trees.
As long as the names are sufficiently clear, and it is clear how to
name new components that complement these, I'm happy. As I see it, all
of these names have loose enough definitions that this issue is _not_
a show stopper.
****************************************************************
From: Matthew Heaney
Sent: Thursday, February 12, 2004 9:33 AM
> Yup. That's precisely how Type'Size works in Ada; it has a fairly weak
> effect on Obj'Size, but in any case, if you set it, you have to return the
> same value (even if that value has nothing to do with how objects are
> actually stored).
The Size function is analogous to the capacity() member function in the
STL vector class.
The Resize procedure is analogous to the reserve() member function.
A vector container is implemented internally as a contiguous array, that
expands as items are inserted into the container.
The Size function returns the length of the internal array. The Length
function returns the number of elements in the array that are "active,"
that have actually been inserted into the vector.
At all times a vector satisfies the invariant that
Length (V) <= Size (V)
The procedure Resize tells the vector to expand to at least the size
specified in the call.
If the current size is equal to or greater than the value specified,
then Resize does nothing.
If the current size is less than the value specified, then the internal
array is expanded. The standard does not specify the exact algorithm
for expansion, and only requires that the Size function return at least
the value specified.
There's nothing special an implementation needs to do to keep track of
the current value of the size, since it has that information already:
it's just the result of the 'Length attribute for the internal array.
>>Resize is an appropriate name for the operation as specified. I expect
>>an operation named Resize to cause resizing. If we're really talking
>>about giving the implementation a hint about an appropriate size, then
>>not only does the specification need to be changed, the name also needs
>>to be different (perhaps Size_Hint?).
The semantics for Resize are described above.
> I don't see a strong need to change the name, but I do agree with you that
> there shouldn't be a *requirement* to do some allocation.
There is a requirement for allocation only if the current size is less
than the size specified in the call to Resize.
****************************************************************
From: Matthew Heaney
Sent: Thursday, February 12, 2004 9:58 AM
> We compiler writers can probably even get Matt to code up the
> implementation for us. ;-)
Ask, and you shall receive...
The latest version (12 Feb 2004) of the reference implementation has an
example of a sorted map, implemented using the sorted set and by
instantiating its nested generic package Generic_Keys.
There are also two examples of hashed sets, for both definite and
indefinite elements. This standard doesn't have a hashed set but if it
did then this is what it would look like.
<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040212.zip>
****************************************************************
From: Stephen Leake
Sent: Thursday, February 12, 2004 11:55 AM
> Yup. That's precisely how Type'Size works in Ada; it has a fairly weak
> effect on Obj'Size, but in any case, if you set it, you have to return the
> same value (even if that value has nothing to do with how objects are
> actually stored).
I think that's a bad idea. It means my scenario (preserve the max size
of a container in a config file, and set it next time on startup)
won't work. If I set the max size from yesterday on startup today, but
then the container grows larger today, when I exit and query the size,
I won't get the larger correct size, but just the one I set at the
begining.
Setting the size should be a hint, but only for a starting point.
Querying the size should always return the current size.
****************************************************************
From: Jeffrey Carter
Sent: Thursday, February 12, 2004 12:47 PM
Randy Brukardt wrote:
> Yup. That's precisely how Type'Size works in Ada; it has a fairly
> weak effect on Obj'Size, but in any case, if you set it, you have to
> return the same value (even if that value has nothing to do with how
> objects are actually stored).
Not quite precisely. There are cases where a compiler is required to use
the specified 'Size.
> One allocation per key is a lot more than one allocation per *map*,
> which is what a stringspace implementation takes. (Well, it might
> have to expand if it gets full, but that should be rare. It could
> degrade to one allocation per key if the keys are very, very long,
> but some care in implementation should prevent degrading.)
OK. I misunderstood.
> I have a component like that (it's actually Tom Moran's), but in
> practice, I've *never* used it without using the index values it
> provides to manage some other data in a separate table (at least
> statistics and/or debugging). Even the 'known words' list in the spam
> filter uses the indexes (handles) for debugging. If that's the case,
> why bother having to use a separate component (causing another chance
> of error)?
>
> So I would guess that the "dummy type" would gain some real data in
> 95% of the applications. And that such uses are less than 10% of the
> uses of a map anyway. Since this is a minimal library, we're not
> trying to cover that remaining 0.5%.
It's not "another component"; it's the underlying implementation of the
hashed map component. My point is that we're requiring the
implementation of a hash table, which is a useful component, but not
requiring that it be provided to users. That's like requiring that a
compiler be able to convert strings into numbers, but not having 'Value
in the language. It doesn't require any additional work by implementors,
nor introduce an additional opportunity for errors, but it does increase
the utility of the library.
To me it's a no brainer, as is converting the map part of the "sorted
set" component (Generic_Keys) into its own component: it's no additional
work for implementors, and allows the user to obtain a sorted map with a
single instantiation, instead of 2.
Put another way, the library has 2 different approaches to defining a
map. In one, we have a map component and hide the underlying
implementation (the hash table). In the other, we have the "sorted set"
component, and then a map component implemented in terms of it. We
should at least be consistent, and I argue that our consistency take the
form of providing both the underlying implementation and the map
implemented in terms of it.
****************************************************************
From: Matthew Heaney
Sent: Thursday, February 12, 2004 3:21 PM
> It's not "another component"; it's the underlying implementation of the
> hashed map component. My point is that we're requiring the
> implementation of a hash table, which is a useful component, but not
> requiring that it be provided to users.
A hash table might be at the wrong level of abstraction (too low). The
hashed map actually takes the level of abstraction up a notch.
In my original proposal, I allowed the user to query each bucket of the
underlying hash table array, but the subcommittee rejected that approach
as too low level, in favor of higher-level First and Succ active
iterator operations.
****************************************************************
From: Matthew Heaney
Sent: Thursday, February 12, 2004 3:43 PM
> Setting the size should be a hint, but only for a starting point.
> Querying the size should always return the current size.
Yes, querying the size should always return length of the internal array.
If the value specified in the call to Resize is larger than the current
length of the internal array, then the internal array is expanded to at
least the length specified.
****************************************************************
From: Simon J. Wright
Sent: Thursday, February 12, 2004 10:42 AM
> The Size function is analogous to the capacity() member function in the
> STL vector class.
>
> The Resize procedure is analogous to the reserve() member function.
...
Do we really need these operations? I presume that they support
optimisation by allocating extra space ahead of time -- do our users
really need that? (assuming of course that the vector will resize
itself if it finds it needs to).
****************************************************************
From: Matthew Heaney
Sent: Thursday, February 12, 2004 11:10 AM
Yes, it makes the optimization you describe -- the Resize preallocates
an internal array large enough to contain all future insertions.
The optimization is important especially for very large numbers of
elements.
But don't take my word for it. Measure the performance of this procedure:
procedure Not_Optimized (V : in out Vector_Type) is
begin
for I in 1 .. 1_000_000 loop
Append (V, New_Item);
end loop;
end;
and then compare it to this one:
procedure Optimized (V : in out Vector_Type) is
begin
Resize (V, Size => 1_000_000);
for I in 1 .. 1_000_000 loop
Append (V, New_Item);
end loop;
end;
If you really want to see a difference then use a complex element type,
perhaps one that is controlled and does lots of internal allocation.
I know it makes a difference because I've actually had the problem. In
my streaming media server, when a file is requested I must load large
indexes comprising several hundred thousand elements that describe the
frames in the file (these are 2 hour movies).
When I first wrote the server there was a huge spike in the CPU monitor
whenever I loaded a file, and this tended to disrupt existing streaming
clients. (This is a real-time streaming media server, and I have to
service several hundred clients simultaneously.)
I did some analysis and realized is was population of the index vector
that was the cause of my problem. So I just figured out my total number
of indexes before inserting and then did a Resize. And now all it well.
So performance matters, and therefore we should keep Size and Resize.
Of course, if your vector objects are small, or you don't have any
special performance needs, then you can just ignore Resize and the
vector will work fine.
****************************************************************
From: Robert A. Duff
Sent: Thursday, February 12, 2004 12:01 PM
I would expect the former to about lg(1_000_000) = 20 allocations,
and the latter to do 1 allocation, presuming the growth is exponential,
which it should be. (E.g. double the size each time you run out
of space.)
> So performance matters, and therefore we should keep Size and Resize.
I agree. I use a similar growable array abstraction quite heavily in my
current project, and there are cases where the code knows the size ahead
of time (or can guess), and I care enough about speed to do the Resize.
****************************************************************
From: Alexandre E. Kopilovitch
Sent: Thursday, February 12, 2004 3:29 PM
> Yes, but access types themselves are not tagged. What they point at is irrelevant.
> If you have a formal "type T is tagged private;" no access type will match
> that; it's the same for interfaces.
Still don't understand: if we can do something useful with, say, an array
(or Unbounded_Array) of interface objects then why can't do the same with
an array of accesses to interface objects - just dereferencing them before
calling a member of the interface?
****************************************************************
From: Randy Brukardt
Sent: Thursday, February 12, 2004 4:58 PM
Because you can't create a container (a map, say) of access types in this
model. Remember, an interface has no implementation, so at some point you
have to have a concrete implementation.
Let me try to give a very simple example:
(* Warning *) This is not a serious proposal! (* End Warning *)
package Ada.Containers is
type Element_Interface is interface;
-- Any element operations here (I don't think there need to be any).
type Cursor_Interface is interface;
-- Any common cursor operations here.
end Ada.Containers;
package Ada.Containers.Interfaces is
type Forward_Iterator_Container_Interface is interface;
function Null_Cursor (Container : Forward_Iterator_Container_Interface) return
Cursor_Interface'Class is abstract;
function Front (Container : Forward_Iterator_Container_Interface) return
Cursor_Interface'Class is abstract;
procedure Increment (Container : Forward_Iterator_Container_Interface;
Cursor : in out Cursor_Interface'Class) is abstract;
function Element (Container : Forward_Iterator_Container_Interface;
Cursor: Cursor_Interface'Class) return Element_Interface'Class
is abstract;
...
-- (It might make more sense to put the "iterator" operations on the Cursor_Interface.
-- But then you'd need a separate interface just for element access through a cursor.)
end Ada.Containers.Interfaces;
with Ada.Containers.Interfaces;
generic
type Key_Type is private;
type Element_Type is new Element_Interface;
... -- As before
package Ada.Containers.Maps is
type Map_Type is new
Ada.Containers.Interfaces.Forward_Iterator_Container_Interface
with private;
-- Of course, other useful interfaces also would be included here.
-- Probably including a "map" one.
type Cursor_Type is new Cursor_Interface with private;
... -- As before. (With appropriate Null_Cursor and Increment routines).
end Ada.Containers.Maps;
Now, to use this, the element type has to 'have' the Element_Interface
interface:
type My_Element_Type is new Ada.Containers.Element_Interface ... with ...;
You can't instantiate the container with a scalar type or an access type or an
array type or any record that doesn't have the Element_Interface interface.
Now, the point of all of this is that you now can write an iteration routine
that will work for any container having the
Forward_Iterator_Container_Interface. For instance, to create a passive
iterator, you could do (this of course isn't useful, but the ability to write
such things is):
generic
with procedure Process (Element : in Element_Interface'Class);
procedure Iterator (Container : Forward_Iterator_Container_Interface'Class);
procedure Iterator (Container : Forward_Iterator_Container_Interface'Class) is
Current : Cursor_Interface'Class := Front (Container);
begin
while Current /= Null_Cursor (Container) loop
Process (Element (Container, Current));
Increment (Container, Current);
end loop;
end Iterator;
Moreover, the instantiations are pretty much the same as the current
proposal. But the element types are limited to tagged types.
****************************************************************
From: Ehud Lamm
Sent: Thursday, February 12, 2004 2:43 AM
But signature packages would work ok, wouldn't they?
****************************************************************
From: Randy Brukardt
Sent: Thursday, February 12, 2004 5:06 PM
Signature packages violate the meta-rule about ease of instantiation: as few
instantiations as possible to get a usable container. (That's one
instantiation, of course.) As far as I can tell, to use them like
interfaces, they'd have to be a parameter to the generic container package.
But perhaps you had something else in mind.
In any case, I don't like signature packages. They add layers of overhead on
a generic sharing implementation (every generic package has a cost, the more
you use, the more that cost is), turning the performance of pretty much
anything into that of bad Java code. (That's not a problem if the signature
doesn't contain anything "expensive", but trying to define that - and work
around it - is a fool's game.)
****************************************************************
From: Randy Brukardt
Sent: Friday, February 13, 2004 12:13 AM
Jeffrey Carter:
> > Yup. That's precisely how Type'Size works in Ada; it has a fairly
> > weak effect on Obj'Size, but in any case, if you set it, you have to
> > return the same value (even if that value has nothing to do with how
> > objects are actually stored).
>
> Not quite precisely. There are cases where a compiler is required to use
> the specified 'Size.
Not for a (sub)type. 13.3(48) says that an object's size is *at least* as
large as the specified size. Anything else said is "advice".
...
> It's not "another component"; it's the underlying implementation of the
> hashed map component. My point is that we're requiring the
> implementation of a hash table, which is a useful component, but not
> requiring that it be provided to users. That's like requiring that a
> compiler be able to convert strings into numbers, but not having 'Value
> in the language. It doesn't require any additional work by implementors,
> nor introduce an additional opportunity for errors, but it does increase
> the utility of the library.
Not true at all. Building a separate hash table component and then building
a map on top of that would be a horrible implementation performance-wise.
Lots of extra call and generic overhead. So, in practice, they'd have
completely separate implementations -- thus, you'd be doubling the work.
Moreover, the component you're describing (a hash table without elements)
wouldn't have any place to *put* elements. So I don't see how you could even
use it to implement the map. (The hash table component you're suggesting
would return a Cursor object to represent each key, but that item isn't an
index that you could use in a sequence. So how would you associate a key
from the hash table with an element? A linear list would work, but would
essentially make the hash table useless.)
What I suspect would happen in practice is that the relatively useless hash
table component would be implemented in terms of a map with a null record
element type. What's the point in that - the user can do that themselves if
they need it?
> To me it's a no brainer, as is converting the map part of the "sorted
> set" component (Generic_Keys) into its own component: it's no additional
> work for implementors, and allows the user to obtain a sorted map with a
> single instantiation, instead of 2.
Matt will tell you that the difference between a Sorted_Set using
Generic_Keys and a Map (any kind) is that the key doesn't have a separate
existence in the Sorted_Set; it's part of the element. Whereas in a Map, it
is separate from the element. There's obviously a significant space
advantage to avoiding duplicate keys.
I originally deleted the Generic_Keys component as redundant (because I too
thought it was a Map), then put it back after a discussion on C.L.A. showed
how important it is.
Matt will also tell you that he'd prefer both a Sorted_Map and a Hashed_Set,
and Tucker would tell you that he'd prefer an Unsorted_Set. And dozens of
people have asked that the List be put back. But that would quickly ballon
the proposal to double its size, and in any case smacks of "feeping
creaturism". :-)
****************************************************************
From: Matthew Heaney
Sent: Friday, February 13, 2004 8:32 AM
> Not true at all. Building a separate hash table component and then building
> a map on top of that would be a horrible implementation performance-wise.
Gulp! I guess Randy hasn't looked at the reference implementation yet...
> Lots of extra call and generic overhead. So, in practice, they'd have
> completely separate implementations -- thus, you'd be doubling the work.
Jeff may have assumed (perhaps by looking at the reference
implementation) that implementors would implement the (hashed) map as a
layer on top of a separate generic hash table component. But as Randy
notes, implementors won't necessarily implement the map container that
way, and so Jeff is basically advocating that another component
(specifically, a low-level hash table data structure) be added to the
standard library.
> Matt will tell you that the difference between a Sorted_Set using
> Generic_Keys and a Map (any kind) is that the key doesn't have a separate
> existence in the Sorted_Set; it's part of the element. Whereas in a Map, it
> is separate from the element. There's obviously a significant space
> advantage to avoiding duplicate keys.
What Randy told you Matt would tell you is correct...
> I originally deleted the Generic_Keys component as redundant (because I too
> thought it was a Map), then put it back after a discussion on C.L.A. showed
> how important it is.
Yes. It allows the instantiator to take advantage of properties of the
generic actual set element type that the generic set itself isn't privy
to. See for example the Indefinite_Sets example in the reference
implementation.
> Matt will also tell you that he'd prefer both a Sorted_Map and a Hashed_Set,
> and Tucker would tell you that he'd prefer an Unsorted_Set. And dozens of
> people have asked that the List be put back. But that would quickly ballon
> the proposal to double its size, and in any case smacks of "feeping
> creaturism". :-)
What Randy told you Matt would tell is once again correct...
****************************************************************
From: Marius Amado Alves
Sent: Friday, February 13, 2004 12:54 PM
I've updated Truc: the "100% Ada" claim is now true. The URL is the same
(www.liacc.up.pt/~maa/containers/truc.ada)
Truc features an implementation of indefinite elements using streams,
alternate to Matt's approach using controlled deallocation. This could be of
interest to implementors. But remember Truc was a proof-of-concept and is
missing many standard functions.
The other principal feature of Truc is now merely academic, that it choses
automatically the most appropriate implementation to the actual element type
w.r.t. definiteness. It is now settled that the choice will be manual (done
by the user).
****************************************************************
From: Dan Eilers
Sent: Friday, February 13, 2004 6:44 PM
I think its a little to soon to say that manual choice is settled.
Certainly it is agreed that there should not be any overhead from
support of indefinite types forced onto users of definite types.
But a user really probably prefers not to have to worry about which
flavor of each container to instantiate, just like users of
generic_elementary_functions currently don't have to explicitly
select between single and double precision versions.
You earlier proposed a language extension as an aside:
> Aside. Of course there is still no standard means to do this, but it
> would be a nice extension. Conditional compilation of generic bodies
> based on instantiation properties. Variant units :-)
> generic
> type T is private;
> ...
> package G is
> when T'Definite =>
> ...;
> when others =>
> ...;
> end;
> (On the subject of conditional compilation, see also the recent Ada
> Preprocessor thread on CLA.)
This looks like too large of a change for the benefit, but there
may be a simpler change that would work. For example, by extending
the syntax for renames to allow a conditional expression, as in:
generic package p1 is
end p1;
generic package p2 is
end p2;
with p1, p2;
generic package p3 renames (if condition then p1 else p2);
****************************************************************
From: Alexandre E. Kopilovitch
Sent: Friday, February 13, 2004 9:38 PM
> Because you can't create a container (a map, say) of access types in this
> model. Remember, an interface has no implementation, so at some point you
> have to have a concrete implementation.
I remember that, but I still can't get how it may be possible that
1) we can create a container of interfaces and
2) we can create a container of accesses and
3) we have accesses to interfaces
but at the same time we cannot create a container of accesses to interfaces.
I don't understand how the delayed implementation of interfaces may create
this situation. Let me follow your example:
> Let me try to give a very simple example:
>
> (* Warning *) This is not a serious proposal! (* End Warning *)
>
> package Ada.Containers is
> type Element_Interface is interface;
Let's change the above line to:
type Item_Interface is interface;
type Element_Access is access all Item_Interface;
> -- Any element operations here (I don't think there need to be any).
>
> type Cursor_Interface is interface;
> -- Any common cursor operations here.
> end Ada.Containers;
>
> package Ada.Containers.Interfaces is
> type Forward_Iterator_Container_Interface is interface;
> function Null_Cursor (Container : Forward_Iterator_Container_Interface) return
> Cursor_Interface'Class is abstract;
> function Front (Container : Forward_Iterator_Container_Interface) return
> Cursor_Interface'Class is abstract;
> procedure Increment (Container : Forward_Iterator_Container_Interface;
> Cursor : in out Cursor_Interface'Class) is abstract;
> function Element (Container : Forward_Iterator_Container_Interface;
> Cursor: Cursor_Interface'Class) return Element_Interface'Class is abstract;
and the above function to:
function Element (Container : Forward_Iterator_Container_Interface;
Cursor: Cursor_Interface'Class) return Element_Access is abstract;
function Item (Container : Forward_Iterator_Container_Interface;
Cursor: Cursor_Interface'Class) return Item_Interface'Class is abstract;
> ...
> -- (It might make more sense to put the "iterator" operations on the Cursor_Interface.
> -- But then you'd need a separate interface just for element access through a cursor.)
> end Ada.Containers.Interfaces;
>
> with Ada.Containers.Interfaces;
> generic
> type Key_Type is private;
> type Element_Type is new Element_Interface;
change above line to
type Item_Type is new Item_Interface;
type Element_Type is access all Item_Type;
> ... -- As before
> package Ada.Containers.Maps is
> type Map_Type is new
> Ada.Containers.Interfaces.Forward_Iterator_Container_Interface with private;
> -- Of course, other useful interfaces also would be included here. Probably
> -- including a "map" one.
> type Cursor_Type is new Cursor_Interface with private;
>
> ... -- As before. (With appropriate Null_Cursor and Increment routines).
> end Ada.Containers.Maps;
>
> Now, to use this, the element type has to 'have' the Element_Interface interface:
> type My_Element_Type is new Ada.Containers.Element_Interface ... with
> ...;
correspondily:
Now, to use this, the element type must be access to a type that has to 'have' the
Item_Interface interface:
type My_Item_Type is new Ada.Containers.Item_Interface ... with ...;
type My_Element_Type is access all My_Item_Type;
> You can't instantiate the container with a scalar type or an access type or
> an array type or
> any record that doesn't have the Element_Interface interface.
But now, with the above changes we can instantiate the containter with access
to a tagged type that has Item_Interface interface.
Where I am wrong here - in which point/step?
****************************************************************
From: Randy Brukardt
Sent: Friday, February 13, 2004 9:47 PM
...
> correspondily:
>
> Now, to use this, the element type must be access to a type that has to 'have' the
> Item_Interface interface:
> type My_Item_Type is new Ada.Containers.Item_Interface ... with ...;
> type My_Element_Type is access all My_Item_Type;
>
> > You can't instantiate the container with a scalar type or an access type or
> > an array type or any record that doesn't have the Element_Interface interface.
>
> But now, with the above changes we can instantiate the containter with access
> to a tagged type that has Item_Interface interface.
>
> Where I am wrong here - in which point/step?
This works, of course, but now you can only instantiate with
access-to-interfaces. That's even more limiting than just interfaces -
because you have to do all of the memory management yourself. If you've been
following along here, I'm sure you've noticed that that won't do.
You could of course support this as an alternative implementation with both
sets of stuff around. But then you've instantly doubled the size of the
library -- and you still can't have a container of floats or of arrays
(especially of unconstrained arrays). Wrappers are very space-inefficient in
the first case, and barely possible for unconstrained arrays (the code to
use them will be very ugly).
****************************************************************
From: Matthew Heaney
Sent: Friday, February 13, 2004 9:08 AM
The current version of the reference implementation has examples of
indefinite sets, maps, and vectors.
<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040213.zip>
However, I have discovered a potential anomaly in indefinite containers
that I wanted to make users aware of.
An indefinite container is implemented by storing a pointer to the
(indefinite) element, and doing the allocation and deallocation of the
element behind the scenes during insertion and deletion.
The issue comes up in the item-less forms of insertion. In that case,
there is a null pointer for the element. This has several consequences.
Consider the vector. When we do an item-less insert, does that mean we
copy the internal pointer up to the next position, and leave a null
pointer at the insertion position? Or do we leave the original element
there and make a copy of the element to slide up?
When we delete vector elements, do we move pointers down, and leave null
element pointers behind? Or are we required to make a copy to slide down?
What should the passive iterator do when it hits a null element pointer?
Skip that position or just raise Constraint_Error?
Should we generalize Replace_Element, to allow a "null element" as the
replacement value?
Does Generic_Element return a null pointer if the element pointer is
null, or does it raise CE?
What should sort do with null elements? Assume that a null element is
always less than a non-null element?
This affects streaming of elements too, because you have to stream out
an extra bit to indicate whether the element is null or not.
My tentative assumption is that we'll have to omit the item-less
insertion operations in the indefinite containers. This mostly applies
only to vector and map, but I still have to analyze the behavior of
indefinite set Generic_Keys nested package.
The reference implementation doesn't do anything special for indefinite
vectors. The indefinite map handles null elements. I will fix both
this weekend, as I prepare an errata list for Randy.
If the developers who want indefinite containers have an opinion about
these matters than please speak up.
****************************************************************
From: Alexandre E. Kopilovitch
Sent: Friday, February 13, 2004 10:19 AM
Matthew Heaney wrote:
> I have discovered a potential anomaly in indefinite containers
> that I wanted to make users aware of.
>
> An indefinite container is implemented by storing a pointer to the
> (indefinite) element, and doing the allocation and deallocation of the
> element behind the scenes during insertion and deletion.
>
> The issue comes up in the item-less forms of insertion. In that case,
> there is a null pointer for the element. This has several consequences.
>
>...
>
> Should we generalize Replace_Element, to allow a "null element" as the
> replacement value?
No.
>...
> If the developers who want indefinite containers have an opinion about
> these matters than please speak up.
Yes, I think that this is right way to go - exclude item-less elements from
containers for indefintite types altogether. This is clear and well-justified
restriction, and it will not harm usefulness of those containers significantly.
>From user's viewpoint I believe that this restriction is fair price for
admission of indefinite type in containers (with basic memory management),
at least in a basic library.
****************************************************************
From: Marius Amado Alves
Sent: Friday, February 13, 2004 10:45 AM
First what is item-less insertion? Checking the AI I guess it is this:
<<
procedure Insert_N
(Vector : in out Vector_Type;
Before : in Index_Type'Base;
Count : in Natural);
Equivalent to Insert_N (Vector, Before, Count, New_Item), with the
difference that the elements in the Count positions starting at Before
are not assigned.
>>
That is (correct me if I'm wrong), the inserted elements hold garbage.
Garbage is garbage, definite or indefinite.
Any attempt to read garbage should raise an exception (I'm checking now if the
AI has this provision; it should).
Sorry, I'm not following strictly your questions, but I think I'm answering
them.
Another 'problem' is what a proper multiple insertion (Insert_N/4 with Count >
1) does for indefinite elements: repeat the same pointer or create N copies
of the item? Value semantics, man. Create N copies.
****************************************************************
From: Marius Amado Alves
Sent: Friday, February 13, 2004 11:18 AM
Matt,
I'm rechecking you questions one by one now, against the 'philosophy'
expressed in my previous post.
...
> Consider the vector. When we do an item-less insert, does that mean we
> copy the internal pointer up to the next position,
yes
> and leave a null
> pointer at the insertion position?
A null or another internal sign of garbage.
> Or do we leave the original element
> there and make a copy of the element to slide up?
This does not make sense. The user is inserting garbage.
> When we delete vector elements, do we move pointers down, and leave null
> element pointers behind? Or are we required to make a copy to slide down?
Move pointers. And leave *nothing* behind. Shrink the vector, as per the spec.
> What should the passive iterator do when it hits a null element pointer?
Whatever it does when it hits an unassigned (garbage) element. Definite or
indefinite.
> Skip that position or just raise Constraint_Error?
Definitely raise something. But in definite elements too. And maybe a more
specific exception. Value_Error. Data_Error.
> Should we generalize Replace_Element, to allow a "null element" as the
> replacement value?
I'm not sure I understand. Please do not create another special entity.
Definitely not Null_Element, which the user would have to define.
> Does Generic_Element return a null pointer if the element pointer is
> null, or does it raise CE?
See above.
> What should sort do with null elements? Assume that a null element is
> always less than a non-null element?
I say raise something.
> This affects streaming of elements too, because you have to stream out
> an extra bit to indicate whether the element is null or not.
Again, raise.
That is, in practice, forbid sort or (container-wide) streaming of a container
with (yet) unassigned elements.
Let the user create its one 'null' element value, if he needs to process it.
> My tentative assumption is that we'll have to omit the item-less
> insertion operations in the indefinite containers.
No. Or, omiting it, omit in definite too.
> This mostly applies
> only to vector and map, but I still have to analyze the behavior of
> indefinite set Generic_Keys nested package.
With the philosophy subsumed in my replies, that analysis should be clear ;-)
Please note my solution implies containers have a 'validity' state. Namely, if
they contain unassigned elements they are invalid w.r.t. some operations e.g.
sort. Maybe a Valid predicate should be added to the spec. Alternatively, we
can simply remove the creation of unassigned elements i.e. omit the item-less
insertion.
> The reference implementation doesn't do anything special for indefinite
> vectors. The indefinite map handles null elements. I will fix both
> this weekend, as I prepare an errata list for Randy.
>
> If the developers who want indefinite containers have an opinion about
> these matters than please speak up.
****************************************************************
From: Stephen Leake
Sent: Friday, February 13, 2004 11:48 AM
Matthew Heaney <mheaney@on2.com> writes:
> An indefinite container is implemented by storing a pointer to the
> (indefinite) element, and doing the allocation and deallocation of the
> element behind the scenes during insertion and deletion.
ok, good.
> The issue comes up in the item-less forms of insertion. In that
> case, there is a null pointer for the element.
Why would I want to do this? Seems bogus to me. Just remove this
operation, all the problems go away!
I had not noticed these versions of Insert before. Do you have an
example of when they are useful?
Note that for definite Item_Type, you can still get Constraint_Error
from an itemless Insert, unless the element is initialized to some
valid value.
> Consider the vector. When we do an item-less insert, does that mean
> we copy the internal pointer up to the next position, and leave a null
> pointer at the insertion position?
Yes.
> Or do we leave the original element there and make a copy of the
> element to slide up?
Why should the null pointer case be any different than the non-null case?
> When we delete vector elements, do we move pointers down, and leave
> null element pointers behind? Or are we required to make a copy to
> slide down?
I guess you mean what do you leave in vector (last + 1). I would move
pointers down, and leave a null pointer (again, this is the same
whether we have null inserts or not).
> What should the passive iterator do when it hits a null element
> pointer? Skip that position or just raise Constraint_Error?
Raise Constraint_Error. The user asked for it.
> Should we generalize Replace_Element, to allow a "null element" as
> the replacement value?
no. Unless you have an example of when that would be useful.
> Does Generic_Element return a null pointer if the element pointer is
> null, or does it raise CE?
Raise Constraint_Error.
It might be nice to have a version of Generic_Element that returns the
pointer, rather than the element. As Maps.Generic_Element does.
> What should sort do with null elements? Assume that a null element is
> always less than a non-null element?
Raise Constraint_Error.
> This affects streaming of elements too, because you have to stream
> out an extra bit to indicate whether the element is null or not.
Raise Constraint_Error.
> My tentative assumption is that we'll have to omit the item-less
> insertion operations in the indefinite containers. This mostly
> applies only to vector and map, but I still have to analyze the
> behavior of indefinite set Generic_Keys nested package.
Ok by me.
> The reference implementation doesn't do anything special for
> indefinite vectors. The indefinite map handles null elements. I will
> fix both this weekend, as I prepare an errata list for Randy.
>
> If the developers who want indefinite containers have an opinion about
> these matters than please speak up.
I have :).
****************************************************************
From: Matthew Heaney
Sent: Friday, February 13, 2004 5:33 PM
>>The issue comes up in the item-less forms of insertion. In that
>>case, there is a null pointer for the element.
>
> Why would I want to do this? Seems bogus to me. Just remove this
> operation, all the problems go away!
That's what I'll do.
> I had not noticed these versions of Insert before. Do you have an
> example of when they are useful?
Because you don't always have a value to assign immediately. What you
want to do is make space in the vector for all the items, and then do
the assignment. For example, suppose you want to copy a list into a vector:
V : Vector_Type;
procedure Copy (List : List_Type; I : Index_Type) is
C : Cursor_Type := First (List);
J : Index_Type := I;
begin
Insert_N (V, Before => I, Count => Length (List));
for K in 1 .. Length (List) loop
Replace_Element (V, Index => J, By => Element (List));
Increment (Cursor);
J := Index_Type'Succ (J);
end loop;
end Copy;
If you don't do it this way, then your time complexity is O(n*m) instead
of O(n+m).
>>My tentative assumption is that we'll have to omit the item-less
>>insertion operations in the indefinite containers. This mostly
>>applies only to vector and map, but I still have to analyze the
>>behavior of indefinite set Generic_Keys nested package.
>
> Ok by me.
This simplifies the model. Let's do it this way.
****************************************************************
From: Randy Brukardt
Sent: Friday, February 13, 2004 10:54 PM
Matt Heaney wrote:
> An indefinite container is implemented by storing a pointer to the
> (indefinite) element, and doing the allocation and deallocation of the
> element behind the scenes during insertion and deletion.
>
> The issue comes up in the item-less forms of insertion. In that case,
> there is a null pointer for the element. This has several consequences.
Well, you have to decide precisely what containers you are creating. (That's
the designers job, I think).
Consider the Sequence.
(Aside: I don't think the name "Vector" is going to make it, given that
AI-296 has about 10 years dibs on that name. And I don't think we want two
different things with the same name in the standard...)
If your container supports sparse sequences, then you need to decide what it
means to not have an element at a position. And whatever that decision is,
it probably ought to be the same for both forms. I tend to agree that
referencing an empty element should cause an exception in that case (it's
better than returning garbage). (Which means that Sorting and [passive]
Iteration would raise that exception when the first empty element was
reached.)
OTOH, if your container does not support sparse sequences, then I don't see
why you ought to have item-less forms of insertion in the first place.
Inserting nothing is a mistake if you can't have undefined elements.
In either case, it is clear that deletion should shrink the (virtual) length
of the sequence. To do anything else would mean that you couldn't reliably
iterate on a sequence that has ever been deleted from. That seems goofy. Of
course, that doesn't mean that you need to change the length of the internal
array. And doing so means that it is irrelevant how items past the logical
end of the array are represented.
I do think that if you support sparse sequences, you need to be able to
stream them in and out. They seem to be potentially useful (imagine a
histogram vector; values that never occurred would not need any value at
all), and if they are legitimate at all, they have to be streamable. Of
course, if you don't support sparse sequences and you get one anyway, that's
a bug. Crashing is fine. :-)
I know that at least some readers have thought that sparse sequences are
supported. So a definitive decision on that is needed.
****************************************************************
From: Robert A. Duff
Sent: Saturday, February 14, 2004 9:56 AM
> (Aside: I don't think the name "Vector" is going to make it, given that
> AI-296 has about 10 years dibs on that name. And I don't think we want two
> different things with the same name in the standard...)
I don't really agree. They are widely-separated enough that confusion
can be avoided.
We already have "dispatching", which means an indirect call when you're
talking about tagged types, but means choosing which task to run when
you're talking about tasks. "Pragma Controlled" and
"Finalization.Controlled" are totally unrelated. A "stub" in the DS
Annex has something to do with inter-process communication; a "stub" in
the core language is a syntactic placeholder for a body. Probably
more...
So there's precedent for using confusing terminology when
convenient. ;-)
"Vector" is good because it matches what other languages call the thing,
and it's short, unlike "Growable_Array" and the like.
[snipped stuff I agree with]
> I know that at least some readers have thought that sparse sequences are
> supported. So a definitive decision on that is needed.
Yes, this is another case where I think the programmer needs to know
(via impl advice or whatever) what's going on under the hood.
****************************************************************
From: Nick Roberts
Sent: Saturday, February 14, 2004 3:51 PM
Randy Brukardt wrote:
>>Since an indefinite Key_Type is required for
>>Containers.Maps.Strings, why not make that capability available to the
>>users?
>
> We definitely expect that the strings container will use a purpose-built
> data structure for storing strings, not some general indefinite item
> capability. Ways to compactly and efficiently store sets of varying size
> strings are well known and commonly used.
>
> Such algorithms could be extended to a general "unconstrained array of
> elementary", but that hardly seems to be a worthwhile definition for keys.
The key value of each element stored in a map (implemented as a hashed
array) must also be stored. Since the Element_Type is definite, making the
Key_Type definite as well makes it possible for the key values (as well as
the element values) to be stored in a fixed array.
This has the advantage of making the implementation simpler, but the
disadvantage of not supporting indefinite key types (which I reckon would
be useful in a significant minority of cases).
Simplifying the implementation has two benefits: implementation costs are
reduced and the risk of failure (bugs) reduced; executional efficiency
(speed more than memory use in this situation) is likely to be increased.
I understand Randy is arguing that executional efficiency should be
considered of relatively low importance for these containers, and I agree.
On the other hand, implementation simplification is, I suspect, going to be
considered quite important by the ARG (and WG9?).
I would, on balance, prefer an indefinite key type, but I've set out the
reasons why a definite key type would be preferred, and I would guess these
reasons would prevail.
>>Another point: Containers.Vectors.Size should return Index_Type'Base,
>>and the Size parameter in Resize should also be Index_Type'Base. It's
>>confusing to have different types for Size and Index.
>>
>>There's also a problem if Natural'Last < Index_Type'Last; you
>>can't have a vector that contains every index!
> ...
> So I don't see a great solution. I wondered about using "Hash_Type" here (it
> has the correct properties), but that seems like a misuse of the type (and a
> bad idea in a library that most Ada programmers will read - you want to show
> them good style in standard libraries).
My preferred solution would be to remove the Index_Type generic parameter
altogether, and make the index type Standard.Positive. I believe this would
have the advantage of simplifying the package from the user's point of
view, it would solve at a stroke the problems mentioned above, and I
believe that no-one in practice will ever need to use a different index type.
****************************************************************
From: Robert A. Duff
Sent: Sunday, February 15, 2004 11:57 AM
I disagree. Using different index types for different kinds of arrays
is a very useful way to catch bugs, even when all those index types are
basically just 1..2**31-1. This is true for the normal built-in array
types, and also for growable ones (Vectors).
I have a growable-array generic in my current project that is
instantiated dozens of times, and it has a "range <>" parameter for the
index type. Some instantiations share the same index type, but most
have their own, and I think that's a Good Thing.
Furthermore, using Positive doesn't solve Randy's problem -- he's got a
compiler where Positive'Last = 2**15-1, but the machine has a 32-bit
address space, so you very well might want Vectors longer than
Positive'Last.
Furthermore, if the Index_Type is "range <>" (which I think it should
be), then the Size can reasonably be of a subtype declared like this:
subtype Size_Type is Index_Type'Base range 1..Index_Type'Base'Last;
As I said before, allowing Index_Type to be modular or enumeration is
not useful, and introduces anomalies.
****************************************************************
From: Matthew Heaney
Sent: Sunday, February 15, 2004 1:08 PM
Bob Duff wrote:
> Furthermore, if the Index_Type is "range <>" (which I think it should
> be), then the Size can reasonably be of a subtype declared like this:
>
> subtype Size_Type is Index_Type'Base range 1..Index_Type'Base'Last;
Bob you have my latest API for the vector container. What I did as a
replacement for Natural is this:
type Element_Count is 0 .. <implementation-defined>;
There's also a Positive_Element_Count subtype.
I don't know if this is the way you want to go but at least it's a start.
I like your idea above, too. One issue is that the T'Last of the
size/length type (Size_Type'Last in your example) needs to be at least the
value of Index_Type'Last - Index_Type'First + 1.
I'm not sure your scheme above will work since Index_Type'Base might not
have all those values. Consider using subtype Natural as the generic
actual index type, which means you have one too many values to represent.
There's always going to be some type that's too big. Suppose I
instantiate the vector with Long_Long_Integer? In that case I don't have
any integer type that can fit the number of values that are theoretically
possible.
I don't think there's any real issue for generic actual index types with a
large range, since you're not going to put that many elements in the
vector container anyway. The problem cases are when you use a type with a
smaller range, e.g.
type My_Index_Type is range -128 .. 127;
The number of possible container elements is 256, but T'Base'Last might
only be 127 (indeed that's all it's required to be).
Of course we could require that users declare their type to have the
required properties:
Last : constant := 127;
First : constant := -128;
N : constant := Last - First + 1;
type My_Index_Type_Base is First .. N;
type My_Index_Type is new My_Index_Type
range First .. Last;
But this is probably too subtle for typical language users.
In the new reference implementation I sent you I use Syste.Max_Int to
declare the Element_Count type, which means casual users would only have
an issue for a generic actual index type such as Long_Long_Integer (whose
use as an index type I would expect to be rare).
> As I said before, allowing Index_Type to be modular or enumeration is
> not useful, and introduces anomalies.
The generic formal index type was also changed as you suggested, to use
the stronger form "range <>" instead of the weaker form "(<>)".
****************************************************************
From: Ehud Lamm
Sent: Sunday, February 15, 2004 4:34 AM
> Ehud Lamm wrote:
> > But signature packages would work ok, wouldn't they?
>
> Signature packages violate the meta-rule about ease of
> instantiation: as few
> instantiations as possible to get a usable container. (That's one
> instantiation, of course.) As far as I can tell, to use them like
> interfaces, they'd have to be a parameter to the generic
> container package.
>
> But perhaps you had something else in mind.
>
I agree with the rationale behind meta-rule: simple things should be simple.
The signatures will not be required in order to use the containers. They
will only be required once you try to write code that should work across
containers AND across libraries.
Since this isn't going to be the most common scenario this probably falls
outside the 80/20 guideline, so I'll leave it at that.
Personally, I like signature packages and interface-oriented programming,
and I would have liked the library to encourage this style even more than it
currently does. But what's now on the table is still a big step forward.
****************************************************************
From: Jeffrey Carter
Sent: Sunday, February 15, 2004 9:29 PM
Randy Brukardt wrote:
> Moreover, the component you're describing (a hash table without elements)
> wouldn't have any place to *put* elements. So I don't see how you could even
> use it to implement the map. (The hash table component you're suggesting
> would return a Cursor object to represent each key, but that item isn't an
> index that you could use in a sequence. So how would you associate a key
> from the hash table with an element? A linear list would work, but would
> essentially make the hash table useless.)
Apparently I'm not making myself clear. Consider:
generic -- Hash_Tables
type Element is private;
with function "=" (Left, Right : Element) return Boolean is <>;
with function Hash (Item : Element) return Hash_Value is <>;
package Hash_Tables is
type Hash_Table is private;
procedure Insert (Into : in out Hash_Table; Item : in Element);
-- Inserts Item into Into. If Into contains an Element X such that
-- Item = X, replaces X with Item.
procedure Delete (From : in out Hash_Table; Item : in Element);
-- If From contains an Element X such that Item = X, deletes X
-- from From. Otherwise, has no effect.
function Is_In (Item : Element; Table : Hash_Table) return Boolean;
-- If Table contains an Element X such that Item = X, returns True;
-- Otherwise, returns False
function Get (Item : Element; From : Hash_Table) return Element;
-- If From contains an Element X such that Item = X, returns X.
-- Otherwise, raise Constraint_Error.
private -- Hash_Tables
...
end Hash_Tables;
generic -- Hashed_Maps
type Key_Info is private;
type Element is private;
with function "=" (Left, Right : Key_Info) return Boolean is <>;
with function Hash (Item : Key_Info) return Hash_Value is <>;
package Hashed_Maps is
type Hashed_Map is private;
procedure Insert (Into : in out Hashed_Map; Key : in Key_Info;
Item : in Element);
-- Inserts Key/Item into Into. If Into contains a key X such that
-- Key = X, replaces the Element associated with X with Item.
procedure Delete (From : in out Hashed_Map; Key : in Key_Info);
-- If From contains a key X such that Key = X, deletes X and the
-- Element associated with it from From. Otherwise, has no effect.
procedure Is_In (Key : Key_Info; Map : Hashed_Map) return Boolean;
-- If Map contains a key X such that Key = X, returns True.
-- Otherwise, returns False.
procedure Get (Key : Key_Info; Map : Hashed_Map) return Element;
-- If Map contains a key X such that Key = X, returns the Element
-- associated with X. Otherwise, raises Constraint_Error.
private -- Hashed_Maps
type Hash_Node is record
Key : Key_Info;
Item : Element;
end record;
function "=" (Left, Right : Hash_Node) return Boolean;
-- Performs Left.Key = Right.Key.
function Hash (Item : Hash_Node) return Hash_Value;
-- Performs Hash (Item.Key).
package Implementation is new Hash_Tables (Element => Hash_Node);
type Hashed_Map is record
Table : Implementation.Hash_Table;
end record;
end Hashed_Maps;
Insert, Delete, and Is_In should be obvious. Get would be implemented as
Dummy : Element;
begin -- Get
Dummy.Key := Key;
return Implementation.Get (Dummy, Map.Table).Item;
Obviously a lot of functionality is missing from this simple example,
but it clearly demonstrates how a hash table can be used to implement a
map, while leaving the hash table available for those who are not
storing key/value pairs.
Yes, I know these won't compile :)
****************************************************************
From: Randy Brukardt
Sent: Monday, February 16, 2004 10:19 PM
> Apparently I'm not making myself clear. Consider:
Definitely. :-)
...
> Obviously a lot of functionality is missing from this simple example,
> but it clearly demonstrates how a hash table can be used to implement a
> map, while leaving the hash table available for those who are not
> storing key/value pairs.
OK, what you're calling a Hash Table is what Matt called a Hashed Set. To
me, a hash table is an index without any elements at all - it's used as part
of the implementation of some larger component.
In any case, as I said earlier, that implementation (which is very similar
to Matt's) would be horrible on our compiler. You'd end up with 3 separate
allocations per element, plus a bunch of call overhead. Other compilers
mileage may vary (although I'd expect most would generate better code
without the extra generic).
So, you cannot assume that there is "no extra cost" here; it would be
another entire component. It would, of course, be very similar to the
"Sorted_Set" component, so it's hard to see that there is enough value to
having a separate container for The Standard, but I'd expect it to appear in
the secondary standard (along with List and Sorted_Map).
****************************************************************
From: Nick Roberts
Sent: Monday, February 16, 2004 5:47 PM
Robert A Duff wrote:
> Nick Roberts wrote:
>
>> My preferred solution would be to remove the Index_Type generic
>> parameter altogether, and make the index type Standard.Positive. I
>> believe this would have the advantage of simplifying the package from
>> the user's point of view, it would solve at a stroke the problems
>> mentioned above, and I believe that no-one in practice will ever need
>> to use a different index type.
>
> I disagree. Using different index types for different kinds of arrays
> is a very useful way to catch bugs, even when all those index types are
> basically just 1..2**31-1. This is true for the normal built-in array
> types, and also for growable ones (Vectors).
I think you are fundamentally wrong on this point, Bob. And I mean
'fundamentally', as I am looking at it from a very purist point of view
(perhaps too purist, I'm not sure). I'll try to explain.
I think arrays (in Ada and similar languages) are used for two
fundamentally different purposes: (a) as a mapping, from the index subtype
to the element subtype; (b) as a sequence of elements.
What marks out the difference between (a) and (b) is that for a sequence,
it is the order of the elements that is of primary importance. A good
example of usage (a) is the array type Schedule in RM95 3.6 (28), which
maps from Day to Boolean. A good example of usage (b) is a String.
In usage (b), the index type is merely used to indicate the relative
positions of the elements of the sequence, and it has long become common
and programming (at least in Ada!) convention to call the first element
number 1, the second number 2, and so on. In mathematics, the set N of
natural (not Natural in the Ada sense!) numbers {1, 2, 3, ...} is almost
always used for this purpose. In Ada, the subtype Positive is almost always
used (it is used for String), and I think it makes logical sense to use the
same subtype for this single purpose.
I believe that, in practice, an extensible array will only ever have usage
(b). Therefore, logically, I think the index type should always be Positive.
I think this argument is reinforced by the tangle that using a generic
Index_Type has obviously got you into. If you simply use Positive, the
problems all go away. Isn't that a bit of a hint?
> I have a growable-array generic in my current project that is
> instantiated dozens of times, and it has a "range <>" parameter for the
> index type. Some instantiations share the same index type, but most
> have their own, and I think that's a Good Thing.
Then ask yourself the question: how difficult would it be to remove the
"range <>" parameter and use Positive instead throughout? I suspect you
would find this quite easy to do, and that the result would be easier to
read and understand.
> Furthermore, using Positive doesn't solve Randy's problem -- he's got a
> compiler where Positive'Last = 2**15-1, but the machine has a 32-bit
> address space, so you very well might want Vectors longer than
> Positive'Last.
I doubt that very much (that you very well might want Vectors longer than
Positive'Last). Presumably this decision was made having being satisfied
that users would not want any String to be longer than 2**15-1 characters.
Surely it would be silly to expect users to be happy with this constraint
on strings, but rebel against it applying to extensible arrays? Surely, if
users of this implementation really required bigger extensible arrays, they
would almost certainly also demand bigger strings, in which case the right
solution would be to make Integer 32-bit based?
> Furthermore, if the Index_Type is "range <>" (which I think it should
> be), then the Size can reasonably be of a subtype declared like this:
>
> subtype Size_Type is Index_Type'Base range 1..Index_Type'Base'Last;
This might be considered a reasonable solution, but it could go wrong. If
Index_Type'First < 1, it might be possible for an extensible array to reach
a length greater than Index_Type'Base'Last. [I think the term 'length' is
more appropriate than 'size'.]
This solution imposes another subtype (or maybe type) upon the user; one
for each instantiation of the extensible array package, in effect. A user
would be annoyed when, for example, trying to compare the length of one
extensible array to that of another (with a different Index_Type'Base), to
find the compiler complaining:
type Apple_Count is range 0..100; -- maximum of 100 apples
type Orange_Count is range 0..2000; -- maximum of 20000 oranges
subtype Apple_Index is Apple_Count range 1..Apple_Count'Last;
subtype Orange_Index is Orange_Count range 1..Orange_Count'Last;
package Apple_Baskets is
new Ada.Containers.Vectors(Apple_Index,Apple);
package Orange_Baskets is
new Ada.Containers.Vectors(Orange_Index,Orange);
Apple_Basket: Apple_Baskets.Vector_Type;
Orange_Basket: Orange_Baskets.Vector_Type;
...
if Size(Apple_Basket) < Size(Orange_Basket) then
This comparison might not work on some implementations. Worse, it might
work on other implementations, and the user could be pretty mystified as to
why.
if Size(Apple_Basket)
< Apple_Baskets.Size_Type(Size(Orange_Basket)) then
seems ugly, and could raise Constraint_Error, and
if Natural(Size(Apple_Basket)) < Natural(Size(Orange_Basket)) then
seems to defeat the purpose (of not simply using Positive as the index
type), and could also raise Constraint_Error (with Randy's compiler, for
example).
I think my way is simpler and better: instantiations of the package do not
require an Index_Type; there is no need for a separate size/length
(sub)type. It is easier to understand and there is less to go wrong.
As an aside, I would reiterate that I think the name 'Size' for the
ordinality function is confusing, and ought to be 'Length', to accord with
the meaning of the Length attribute.
> As I said before, allowing Index_Type to be modular or enumeration is
> not useful, and introduces anomalies.
And I think replacing Index_Type with Positive would reduce the anomalies
still further.
****************************************************************
From: Randy Brukardt
Sent: Monday, February 16, 2004 10:56 PM
> I believe that, in practice, an extensible array will only ever have usage
> (b). Therefore, logically, I think the index type should always
> be Positive.
That's only true if we're not supporting sparse sequences. (And perhaps not
even then.) I disagree with Bob and Matt that modular indexes aren't useful,
and can even imagine uses for enumeration index types (although that would
be rare enough not to worry about).
> I think this argument is reinforced by the tangle that using a generic
> Index_Type has obviously got you into. If you simply use Positive, the
> problems all go away. Isn't that a bit of a hint?
Yeah, and if we got rid of the generic and just made the elements void *
we'd have less problems still. :-)
Seriously, Ada is about strong typing, and you're suggesting to deny the
programmer the power of strong typing in this package. That's a non-starter
in my view.
...
> > Furthermore, using Positive doesn't solve Randy's problem -- he's got a
> > compiler where Positive'Last = 2**15-1, but the machine has a 32-bit
> > address space, so you very well might want Vectors longer than
> > Positive'Last.
>
> I doubt that very much (that you very well might want Vectors longer than
> Positive'Last). Presumably this decision was made having being satisfied
> that users would not want any String to be longer than 2**15-1 characters.
That's a complete fallacy. The reason this decision was made (in 1987!) was
that we wanted to be able to migrate users from our 16-bit MS-DOS compilers
to our 32-bit compilers with as little incompatibility as possible. The
intent was that if a program was recompiled on a 32-bit compiler, it would
run and work, including being able to read and write files in the same
format.
> Surely it would be silly to expect users to be happy with this constraint
> on strings, but rebel against it applying to extensible arrays? Surely, if
> users of this implementation really required bigger extensible arrays,
they
> would almost certainly also demand bigger strings, in which case the right
> solution would be to make Integer 32-bit based?
If someone wants a 32-bit string, all they have to do is write:
type Long_Natural is range 0 .. 2**31-1;
subtype Long_Positive is Long_Natural range 1 .. Long_Positive'Last;
type Long_String is array (Long_Positive range <>) of Character;
which works fine (except for the language-defined packages). Moreover, this
will work on essentially any Ada compiler (including our 16-bit MS-DOS
compilers) without any dependence on the definitions of predefined types.
OTOH, making Integer 32-bit would use more data memory (potentially a lot
more), and could make existing files unreadable. The amount of pain for a
programmer to change from 16-bit Integer to 32-bit Integer depends on the
code of course, but it can be worse than moving to another compiler
altogether. We don't want to be encouraging our customers to move to another
vendor!
The only real option would be to have a compiler switch of some sort to
select which is used, but that would require lots of work in the compiler -
everything assumes a single definition for Standard. (Yes, we've studied it
seriously, as the choice of 16-bit for Integer is a significant portability
issue - far too many people assume the range of that type, where if they
really care about the range, they should declare their own type.) There are
many other things of more value to our customers at this time.
No Ada program should depend on predefined elementary types. Period.
Unfortunately, type String drags in Natural, leaving no real chance to
enforce a decent Ada style (you can't easily tell when a use of Natural is
for indexing String, or when it is being abused). That's a bug in the Ada
design, but one we're going to have to live with.
> This solution imposes another subtype (or maybe type) upon the user; one
> for each instantiation of the extensible array package, in effect. A user
> would be annoyed when, for example, trying to compare the length of one
> extensible array to that of another (with a different Index_Type'Base), to
> find the compiler complaining:
I agree, but not with your solution. Clearly, there should be a Size_Type
next to Hash_Type in Ada.Containers. If you actually need to do math on it
(which should be very rare), you'd need a "use type
Ada.Containers.Size_Type;", but with any decent style, you'll need that no
matter what the type is or where it is declared. You don't want it in the
generic unit (for the reasons you stated), Natural is clearly bad (use
predefined scalar types only for String in new code - we want to show
readers of the standard good style), so a type is needed somewhere fairly
high up in the hierarchy.
****************************************************************
From: Matthew Heaney
Sent: Monday, February 16, 2004 11:54 PM
I got rid of the subtype Natural in the container packages, per Randy's
request.
I modified the proposal and the reference implementation so that each
generic package declares its own modular Element_Count type. In the case
of the map it just derives from Hash_Type; in the vector and set it's its
own declaration.
My issue with Randy's solution is that the operators for the size type
aren't visible where the instantiation is visible, so you have to with
Ada.Containers specially. (But is that really true? I still have to
check that.) By declaring the type right in the generic package, the user
has immediate access to the size type.
Perhaps it's not such a big deal to have to make a special with of
Ada.Containers. I don't really know. One advantage of Randy's solution
is that the packages can share the size type. So for example you can pass
the result of the Length function of one container to the Resize operation
for some other container, and no type conversion is necessary. On the
other hand, doing that across different container instantiations might be
rare.
So where the size type lives, what its name is, etc, is still very
tentative. The next release will merely show one way to do it.
****************************************************************
From: Robert A. Duff
Sent: Tuesday, February 17, 2004 8:42 AM
...
> In usage (b), the index type is merely used to indicate the relative
> positions of the elements of the sequence, and it has long become common
> and programming (at least in Ada!) convention to call the first element
> number 1, the second number 2, and so on. In mathematics, the set N of
> natural (not Natural in the Ada sense!) numbers {1, 2, 3, ...} is almost
> always used for this purpose. In Ada, the subtype Positive is almost always
> used (it is used for String), and I think it makes logical sense to use the
> same subtype for this single purpose.
Positive is rarely used in well-written Ada code, except when using
String. It was a language-design mistake to use Positive for String;
there should have been a separate String_Index type. It was also a
language design mistake to put non-standard stuff like Integer and
Long_Integer in Standard.
> I believe that, in practice, an extensible array will only ever have usage
> (b). Therefore, logically, I think the index type should always be Positive.
I agree with the above philosophy (mappings vs sequences). However, it
does not follow that sequences should always be indexed by Positive.
It should usually be indexed by a type whose range is 1..<something>.
There are good reasons why the programmer might want different upper
bounds. There are also some cases where 0..<something> makes more
sense for a sequence. Therefore, we should leave this choice to the
programmer.
Furthermore, it is important to allow the programmer to use different
index types for unrelated sequences, in order to prevent bugs.
For the same reason, when I declare a sequence-like array type,
I usually declare a new index type for it. If two array types
are related so that I want to say things like:
for I in ... loop
... A(I) ...
... B(I) ...
then I use the same index type for both.
> I think this argument is reinforced by the tangle that using a generic
> Index_Type has obviously got you into. If you simply use Positive, the
> problems all go away. Isn't that a bit of a hint?
My proposal has no "tangles" that I can see. All the tangles are caused
by using modular or enumeration types for the index, which I don't
recommend.
> > I have a growable-array generic in my current project that is
> > instantiated dozens of times, and it has a "range <>" parameter for the
> > index type. Some instantiations share the same index type, but most
> > have their own, and I think that's a Good Thing.
>
> Then ask yourself the question: how difficult would it be to remove the
> "range <>" parameter and use Positive instead throughout? I suspect you
> would find this quite easy to do, and that the result would be easier to
> read and understand.
It would of course be trivial to remove that capability, but that's not
the issue. It would damage the type checking, so I wouldn't do that.
By the way, my growable arrays generic says:
pragma Assert(Index_Type'First = 1);
I did run into one case where that was inconvenient, and I wanted
sequences starting at 100_000_000, 200_000_000, etc.
I decided not to remove that assertion, though.
> > Furthermore, using Positive doesn't solve Randy's problem -- he's got a
> > compiler where Positive'Last = 2**15-1, but the machine has a 32-bit
> > address space, so you very well might want Vectors longer than
> > Positive'Last.
>
> I doubt that very much (that you very well might want Vectors longer than
> Positive'Last). Presumably this decision was made having being satisfied
> that users would not want any String to be longer than 2**15-1 characters.
>
> Surely it would be silly to expect users to be happy with this constraint
> on strings, but rebel against it applying to extensible arrays? Surely, if
> users of this implementation really required bigger extensible arrays, they
> would almost certainly also demand bigger strings, in which case the right
> solution would be to make Integer 32-bit based?
Well, the machine in question is a 32-bit machine, so Integer really
*should* be 32 bits. But Randy chose 16 bits for compatibility reasons,
which makes perfect sense. Perhaps if Randy's customers had followed
good coding practise, he wouldn't have been forced into that decision. ;-)
> > Furthermore, if the Index_Type is "range <>" (which I think it should
> > be), then the Size can reasonably be of a subtype declared like this:
> >
> > subtype Size_Type is Index_Type'Base range 1..Index_Type'Base'Last;
>
> This might be considered a reasonable solution, but it could go wrong. If
> Index_Type'First < 1, it might be possible for an extensible array to reach
> a length greater than Index_Type'Base'Last.
So don't do that. You and I already agreed that Index_Type'First = 1,
usually. Even if it's 0, you can't create a Vector that big, presuming
the upper bound is 2**31-1 on a 32-bit machine.
>... [I think the term 'length' is
> more appropriate than 'size'.]
I agree that size is not ideal. But we're not talking about the
*current* length, we're talking about the maximum length we can grow
to without doing more allocation. How about Buffer_Length, which
appropriately indicates that we're talking about the internal buffer.
> This solution imposes another subtype (or maybe type) upon the user; one
^^^^^^^^^^^^^
I said subtype, not type. We're measuring number of components, here,
not bytes. So it makes perfect sense to use the same type for indexing
as for this size measurement (but obviously a different subtype).
...
> ...
>
> if Size(Apple_Basket) < Size(Orange_Basket) then
>
> This comparison might not work on some implementations. Worse, it might
> work on other implementations, and the user could be pretty mystified as to
> why.
Heh? First of all, given my proposal, the above comparison would be
illegal on *all* implementations. That's what I want -- if
Apple_Baskets and Orange_Baskets are unrelated, then I *want* that
comparison to be illegal. On the other hand, if the two abstractions
are related in such a way that indexes into one make sense for the
other, then the programmer should say so -- use the same index type for
both instantiations. This should be the programmer's choice.
****************************************************************
From: Robert A. Duff
Sent: Tuesday, February 17, 2004 8:38 AM
> No Ada program should depend on predefined elementary types. Period.
So you don't use Boolean in your programs? Maybe it's
"(False, Maybe, True)" on some implementations? ;-)
Sorry, I couldn't resist -- I of course know what you meant.
****************************************************************
From: Robert A. Duff
Sent: Tuesday, February 17, 2004 8:53 AM
> I got rid of the subtype Natural in the container packages, per Randy's
> request.
Maybe you should wait for the whole ARG to come to a decision before you
make further changes in this area.
> I modified the proposal and the reference implementation so that each
> generic package declares its own modular Element_Count type. In the case
> of the map it just derives from Hash_Type; in the vector and set it's its
> own declaration.
In the map and set, it should probably be a *signed* type: "type
Element_Count range 0..implementation-defined". It's got nothing to do
with Hash_Type.
For Vector, it is related to the Index_Type, and should therefore be a
subtype of the same type:
subtype Element_Count is Index_Type'Base range 0..Index_Type'Base'Last;
You might, for example, want to set the size to twice the current length
of the vector. Both types are in the same "units", as it were -- number
of components, so they should be the same type.
(The above assumes that you agree with me that Index_Type should be
"range <>"; I know Randy, and perhaps others, don't agree with that.)
Furthermore, whether two different vectors should have the same
Index_Type and Element_Count type should be the programmer's choice.
Note that sets/maps are different from vectors -- in the former case,
the implementation controls the maximum size (it's related to available
memory), whereas in the vector case, the programmer controls the max
size by choosing the value of Index_Type'Last.
> My issue with Randy's solution is that the operators for the size type
> aren't visible where the instantiation is visible, so you have to with
> Ada.Containers specially. (But is that really true? I still have to
> check that.)
You don't need an extra with_clause, but you would need an extra
use_clause. I agree that's slightly annoying.
>...By declaring the type right in the generic package, the user
> has immediate access to the size type.
But by making it a subtype of the type of Index_Type, all the operators
will be visible wherever the instance is visible.
****************************************************************
From: Matthew Heaney
Sent: Tuesday, February 17, 2004 9:39 AM
> Maybe you should wait for the whole ARG to come to a decision before you
> make further changes in this area.
OK. Randy wanted an errata list early this week, and I wasn't sure
whether I was responsible for coming up with the version that didn't use
the Natural subtype. It sounds like you guys already have some other ideas.
> In the map and set, it should probably be a *signed* type: "type
> Element_Count range 0..implementation-defined". It's got nothing to do
> with Hash_Type.
OK. That's the kind of feedback I was looking for.
I also wasn't sure whether you wanted signed or unsigned types as the
size/count/length type. I guess I assumed you'd want unsigned, since
that gives you a bigger range.
...
> (The above assumes that you agree with me that Index_Type should be
> "range <>"; I know Randy, and perhaps others, don't agree with that.)
My tentative conclusion was to do as you suggested, and restrict the
vector to use only integer index types. However, it appears that there
is still debate among the subcommittee, so I guess it's still an open issue.
The only problem with your scheme above is that Index_Type'Base doesn't
necessarily include all the values you need. For example:
type Index_Type is -10 .. 5;
Index_Type'Base'Last might only be 5, but we need it to be at least 16.
However, since this is supposed to be an expandable array, then maybe
the index type above doesn't make any sense.
Note that I'm not married to the name Element_Count; it was just an
idea. I was using the container analog of type Storage_Count as the
model. The name Size_Type might be better, which is the closer to the
style of name Hash_Type, and to the style of the actual container names.
> Furthermore, whether two different vectors should have the same
> Index_Type and Element_Count type should be the programmer's choice.
>
> Note that sets/maps are different from vectors -- in the former case,
> the implementation controls the maximum size (it's related to available
> memory), whereas in the vector case, the programmer controls the max
> size by choosing the value of Index_Type'Last.
OK. I was assuming the model was the same for all containers (max
elements is controlled by available memory).
>>My issue with Randy's solution is that the operators for the size type
>>aren't visible where the instantiation is visible, so you have to with
>>Ada.Containers specially. (But is that really true? I still have to
>>check that.)
>
> You don't need an extra with_clause, but you would need an extra
> use_clause. I agree that's slightly annoying.
I wasn't sure about that. I was thinking that in order to say "use type
Ada.Containers.Size_Type", you had to with Ada.Containers too. But it
sounds like I was wrong.
> But by making it a subtype of the type of Index_Type, all the operators
> will be visible wherever the instance is visible.
Yes. I like using Index_Type'Base, but wasn't sure whether we would run
into snags wrt the base range of the type being large enough. It sounds
like that's not really an issue.
****************************************************************
From: Robert A. Duff
Sent: Tuesday, February 17, 2004 10:20 AM
> I also wasn't sure whether you wanted signed or unsigned types as the
> size/count/length type. I guess I assumed you'd want unsigned, since
> that gives you a bigger range.
This is why I hate modular types. One is tempted to use them when
wraparound arithmetic is inappropriate, just to get one extra bit.
(IMHO, "type T is range 1..2**32-1;" should be legal on all
implementations -- for that matter, so should "range 1..10**100".
But I realize that's a pretty radical notion!)
Anyway, in this case, the extra bit probably isn't necessary. You can't
create a vector of 2 billion integers on a 32-bit machine -- you'll run
out of address space first. Even if the component type is Character,
you're unlikely to want to do that. I believe many operating systems
steal half the address space for their own use, so no single process can
use more than 2 billion bytes anyway. On a 64-bit machine, a vector of
2**62 components is unthinkable anytime soon.
As I said, "1..<something>" will be the most common index range, in
which case 'Length can't be more than 'Last.. If that's not enough, buy
a compiler that supports bigger signed integers.
I want overflow/constraint checking on that type. So I suggest signed
integer rather than modular.
> The only problem with your scheme above is that Index_Type'Base doesn't
> necessarily include all the values you need. For example:
>
> type Index_Type is -10 .. 5;
>
> Index_Type'Base'Last might only be 5, but we need it to be at least 16.
Yes, it is possible to shoot yourself in the foot. So don't do that. ;-)
This is already an issue in Ada -- the programmer must take care to make
sure base ranges are wide enough. Nothing new here.
> However, since this is supposed to be an expandable array, then maybe
> the index type above doesn't make any sense.
It would be rare, I'd say.
...
> OK. I was assuming the model was the same for all containers (max
> elements is controlled by available memory).
Well, I suppose it *usually* will be -- the programmer will use an
Index_Type that goes up to the roughly size of the address space. But
the programmer can choose a smaller Index_Type, and there are sometimes
good reasons to do so.
...
> I wasn't sure about that. I was thinking that in order to say "use type
> Ada.Containers.Size_Type", you had to with Ada.Containers too. But it
> sounds like I was wrong.
If you say "with A.B.C;", it causes all of A, A.B, and A.B.C to be
visible. Look at the definition of "mentioned in a with_clause".
This is because compilers might have trouble dealing with holes in
the visibility -- cases where something is in scope, but the thing it's
declared inside of is not.
Use clauses don't work like that.
****************************************************************
From: Matthew Heaney
Sent: Tuesday, February 17, 2004 10:56 AM
...
> If you say "with A.B.C;", it causes all of A, A.B, and A.B.C to be
> visible. Look at the definition of "mentioned in a with_clause".
> This is because compilers might have trouble dealing with holes in
> the visibility -- cases where something is in scope, but the thing it's
> declared inside of is not.
>
> Use clauses don't work like that.
I guess I'm still confused. I just tried this:
with Character_Vectors; use Character_Vectors;
procedure Test is
use type Ada.Containers.Hash_Type;
begin
null;
end Test;
but GNAT is telling me that I'm
"missing with for Ada.Containers"
I put a subtype declaration in the vectors package, like this:
subtype Hash_Type is Containers.Hash_Type;
and then I could say:
use type Character_Vectors.Hash_Type;
But that's different from what Randy said to Nick:
>I agree, but not with your solution. Clearly, there should
>be a Size_Type next to Hash_Type in Ada.Containers. If you
>actually need to do math on it (which should be very rare),
>you'd need a "use type Ada.Containers.Size_Type;", but with
>any decent style, you'll need that no matter what the type
>is or where it is declared.
I didn't know how to get "use type Ada.Containers.Size_Type;" to work
without also with'ing Ada.Containers. But perhaps Randy meant something
else? I'm not sure.
If you want to declare
type Size_Type is range 0 .. <implementation-defined>;
in Ada.Containers, I assumed you'd have to also declare a Size_Subtype
in Ada.Containers.Sorted_Sets and Ada.Containers.Maps, like this:
subtype Size_Subtype is Size_Type;
and then the user would have to say:
with Instantiation; use type Instantiation.Size_Subtype;
But that's different from saying "use type Ada.Containers.Size_Type;".
****************************************************************
From: Jeffrey Carter
Sent: Tuesday, February 17, 2004 11:35 AM
> OK, what you're calling a Hash Table is what Matt called a Hashed Set. To
> me, a hash table is an index without any elements at all - it's used as part
> of the implementation of some larger component.
We've already established that what Matt calls a "set" isn't.
I'm afraid you're not making yourself clear now. With rare exceptions,
hash functions can produce the same hash value for different elements.
This results in "collisions". Therefore, hash tables store the elements
so a lookup can determine if a specific element is actually in the
table, or just hashes to the same value as another element. Since an
element can contain information not used in calculating the hash or for
"=", it seems that a hash table has to have an interface something like
the one I presented.
In other words, without seeing something more specific (like a spec), I
can't tell how your idea of a hash table would work.
> In any case, as I said earlier, that implementation (which is very similar
> to Matt's) would be horrible on our compiler. You'd end up with 3 separate
> allocations per element, plus a bunch of call overhead. Other compilers
> mileage may vary (although I'd expect most would generate better code
> without the extra generic).
The solution is simple: don't use your compiler :)
For most applications that will be willing to use a standard component,
I doubt the performance will be unacceptable on any compiler.
> So, you cannot assume that there is "no extra cost" here; it would be
> another entire component. It would, of course, be very similar to the
> "Sorted_Set" component, so it's hard to see that there is enough value to
> having a separate container for The Standard, but I'd expect it to appear in
> the secondary standard (along with List and Sorted_Map).
The component would have to be specified, of course. I'm sure Matt or I
would be able and willing to do that, and it wouldn't take very long.
There is no extra implementation cost. Implementors are going to have to
implement a hash table in order to implement hashed maps anyway. Let's
be good software engineers and allow the reuse of that effort.
****************************************************************
From: Randy Brukardt
Sent: Wednesday, February 18, 2004 5:12 PM
> I'm afraid you're not making yourself clear now. With rare exceptions,
> hash functions can produce the same hash value for different elements.
> This results in "collisions".
Of course. But to me, a hash table is just a table (array); collision
handling is not part of it. It's a necessary part of a component, of course,
which is why it's impossible to have a hash table component.
But arguing over terminology is pointless. You're arguing in favor of Matt's
Hashed_Set (even if you don't want to call it that). It's better to stick to
a common set of terminology, even if you don't like it.
...
> > In any case, as I said earlier, that implementation (which is very similar
> > to Matt's) would be horrible on our compiler. You'd end up with 3 separate
> > allocations per element, plus a bunch of call overhead. Other compilers
> > mileage may vary (although I'd expect most would generate better code
> > without the extra generic).
>
> The solution is simple: don't use your compiler :)
Them's fighting words, even with the smiley. Being intolerant of the
diversity of Ada implementations (and uses) is a good way to get yourself
tuned out of ARG deliberations.
> For most applications that will be willing to use a standard component,
> I doubt the performance will be unacceptable on any compiler.
Which of course is exactly the argument I've been making all along. Of
course, then the Sorted_Set and the Vector are also good enough -- which is
quite contrary to your position.
...
> The component would have to be specified, of course. I'm sure Matt or I
> would be able and willing to do that, and it wouldn't take very long.
> There is no extra implementation cost. Implementors are going to have to
> implement a hash table in order to implement hashed maps anyway. Let's
> be good software engineers and allow the reuse of that effort.
I've already said multiple times that there would be a significant extra
implementation cost. Even though some of the implementation could be reused,
there would still be a lot of unique work. In any case, repeating a
falsehood doesn't make it true.
But imagine for a moment that you're right, and there is not a line of extra
code that needs to be written. You're still doubling the documentation,
debugging, and testing costs for implementers. Clearly, this component will
need a unique set of tests, and while there is a bit of sharing available,
most of it will need to be different. And even if there are no bugs in the
implementation at all, you still have to do the testing. So the cost will be
a lot more than zero.
****************************************************************
From: Stephen Leake
Sent: Tuesday, February 17, 2004 12:43 PM
...
> Because you don't always have a value to assign immediately. What you
> want to do is make space in the vector for all the items, and then do
> the assignment. For example, suppose you want to copy a list into a
> vector:
...
> If you don't do it this way, then your time complexity is O(n*m)
> instead of O(n+m).
Ok. I actually ran across a similar situation in Real Code this
weekend :).
If you were doing Insert (at end) rather than Insert (in the middle),
your time complexity would be O(m), right? (n is the size of the
vector, m is the size of the list).
In general, Insert (in the middle) is an O(n) operation. So Insert_N
(in the middle, no elements) is an optimization to work around that in
some common cases.
I think if you are really doing code like this, and you want the
optimization, you should make the Vector Item_Type be an access type,
and manage the memory yourself. Optimized code is always harder to
write.
So I'm affirming that deleting the itemless insertion from the
indefinite map is ok.
****************************************************************
From: Matthew Heaney
Sent: Tuesday, February 17, 2004 1:10 PM
> Ok. I actually ran across a similar situation in Real Code this
> weekend :).
This happens all the time: you know in advance how many items you want
to insert, so you tell the vector allowing it to preallocate, and then
you do the insert.
> If you were doing Insert (at end) rather than Insert (in the middle),
> your time complexity would be O(m), right? (n is the size of the
> vector, m is the size of the list).
Yes, that's correct. The n part reduces to 0, because you're not
sliding elements already in the vector container.
> In general, Insert (in the middle) is an O(n) operation. So Insert_N
> (in the middle, no elements) is an optimization to work around that in
> some common cases.
Yes. It is specifically designed for inserting in the middle of a vector.
In the case of the STL, what happens it that you specify an iterator
pair designating the half-open range of the source container. The
vector probably computes the distance() first, then does the internal
expansion, and then walks the source range constructing each new vector
element in place.
For a std::vector, the distance() function is specialized so that it
computes the distance in constant time (because vector iterators are
random access iterators, and therefore distance() can be implementing
for a vector by simple subtraction).
We can't get this sophisticated in Ada, but we can be almost as
efficient. Instead of the vector itself calling distance(), it's the
vector user who computes the distance (by whatever method makes sense),
and then calls Insert_N to do the preallocation.
So in this particular case (inserting multiple elements in the middle of
a vector), in Ada the complete insertion operation actually comprises
two separate calls.
> I think if you are really doing code like this, and you want the
> optimization, you should make the Vector Item_Type be an access type,
> and manage the memory yourself. Optimized code is always harder to
> write.
No. Doesn't that argument undermine the case for indefinite forms? The
Insert_N operation provides important and useful functionality, just
like Resize does. There's nothing special about indefinite vectors, and
the same techniques for optimized insertions apply as for the definite form.
> So I'm affirming that deleting the itemless insertion from the
> indefinite map is ok.
I think they need to stay. If nothing else the definite and indefinite
forms require a more or less identical interface.
****************************************************************
From: Alexandre E. Kopilovitch
Sent: Tuesday, February 17, 2004 2:25 PM
This is return to the topic of interfaces in conjunction with Container
Library, to the starting point of recent brief discussion - now I'm taking
another branch of argumentation, which addresses the topic in the most direct
way.
...
> One way to get around that would be to put the interfaces into the generic
> units. But then, the interfaces would only be usable with that container --
> hardly a useful interface! You might as well just use the container
> directly.
Although I'm not 100% sure what exactly you see as a problem with generic
interfaces in Container Library, but guessing that you mean massive duplication
of declarations of operations, I came to an idea how to overcome this problem
with generics and make employment of interfaces in the library rather smooth.
Let's introduce new form of interface declaration:
type IT is interface of T; -- where T is a type, possibly generic one
This will mean that IT is interface, which consists of declarations of all
public primitive operations of T, in which all occurences of type T are
substituted by interface IT. Type T automatically implements IT.
If T in above declaration is generic type then IT is generic interface. In
that case instantiation (perhaps partial) may be made inside the declaration,
if needed:
type IT is interface of T<instantiation-paramater(s)>; -- for generic T
I think that this form of interface declaration will solve the problem
mentioned above.
[Also, this form may be extended even further - by not requiring T to be a
tagged type (but interface type IT will still be tagged) - with the same
definition, that is, the interface IT constists of all primitive operations
(which are all public in this case) of T. But this probably isn't directly
related to the Container Library.]
****************************************************************
From: Robert A. Duff
Sent: Tuesday, February 17, 2004 6:03 PM
...
> > If you say "with A.B.C;", it causes all of A, A.B, and A.B.C to be
> > visible. Look at the definition of "mentioned in a with_clause".
> > This is because compilers might have trouble dealing with holes in
> > the visibility -- cases where something is in scope, but the thing it's
> > declared inside of is not.
> >
> > Use clauses don't work like that.
>
> I guess I'm still confused.
I don't think you're confused. I think I wrote something confusing above.
Sorry about that.
>... I just tried this:
>
> with Character_Vectors; use Character_Vectors;
>
> procedure Test is
> use type Ada.Containers.Hash_Type;
> begin
> null;
> end Test;
>
> but GNAT is telling me that I'm
>
> "missing with for Ada.Containers"
Correct. If you want to refer to Ada.Containers.Hash_Type,
you need to say "with Ada.Containers;". I was assuming you would
have said "with Ada.Containers.Something;" already, but that's not
necessarily true.
I should probably admonish you to use the RM as the definition of the
language, rather than what one compiler happens to do. ;-)
Chapters 8 and 10 explain all this -- but chapter 8 is pretty
tough going.
> I put a subtype declaration in the vectors package, like this:
>
> subtype Hash_Type is Containers.Hash_Type;
>
> and then I could say:
>
> use type Character_Vectors.Hash_Type;
Yes, that could work. However, that will make use-package clauses less
useful, because if you say "use Character_Vectors, Integer_Vectors;",
then the two Hash_Type's will conflict, and cancel each other out.
> But that's different from what Randy said to Nick:
>
> >I agree, but not with your solution. Clearly, there should
> >be a Size_Type next to Hash_Type in Ada.Containers. If you
> >actually need to do math on it (which should be very rare),
> >you'd need a "use type Ada.Containers.Size_Type;", but with
> >any decent style, you'll need that no matter what the type
> >is or where it is declared.
>
> I didn't know how to get "use type Ada.Containers.Size_Type;" to work
> without also with'ing Ada.Containers.
You're right.
>... But perhaps Randy meant something
> else? I'm not sure.
>
> If you want to declare
>
> type Size_Type is range 0 .. <implementation-defined>;
>
> in Ada.Containers, I assumed you'd have to also declare a Size_Subtype
> in Ada.Containers.Sorted_Sets and Ada.Containers.Maps, like this:
>
> subtype Size_Subtype is Size_Type;
>
> and then the user would have to say:
>
> with Instantiation; use type Instantiation.Size_Subtype;
>
> But that's different from saying "use type Ada.Containers.Size_Type;".
You're right. I suggest that if Size_Type is declared in Containers,
let the programmer write "with Ada.Containers; use type
Ada.Containers.Size_Type;". Declaring Size_Subtype causes the
"cancelling out" problem I mentioned above. But I don't feel strongly
about this. I do think my suggestion for Vectors solves the problems
better -- but not for sets/maps (unless you pass in the Size_Type as a
generic formal to those).
During the Ada 9X project, we considered a rule that if there are 17
potentially directly visible things call X, and they're all essentially
renamings of the same thing, then the compiler picks one at random.
But the rules would be pretty tricky, and the idea got dropped.
****************************************************************
From: Nick Roberts
Sent: Wednesday, February 18, 2004 12:23 PM
Apologies for this not being in response to anything anyone has
specifically said, but the containers topic has generated such a spout of
messages, it's difficult!
I would repeat (I'm sure I've said it before many times) that the container
packages /do not need/ indefinite forms, now or in the future.
The reason is simple:
(a) if you want to contain an indefinite type, and you want to abstract
away such low-level mechanics as memory management (quite rightly), all you
do is write a package that exports a definite private type, with the
required operations and other accoutrements (constants, support types and
subtypes), and encapsulates the underlying indefinite type indide that
definite type (almost certainly by using dynamic allocation);
(b) to support class-wide types or any indefinite types whose objects are
not dynamically allocated (so that memory management is not an issue), you
can contain an access type that designates them.
For strings, Ada.Strings.Unbounded is a perfect example of (a). You can use
definite containers on unbounded strings without problems.
End of story, and hopefully end of argument.
Randy suggested a semi-global Size_Type declared in Ada.Containers. Bob D
reckoned this was good for maps and sets, but not vectors. I still disagree
with Bob about the vector package having its own Index_Type generic
parameter. I think that the practical advantages of having a pre-supplied
universal index type would greatly outweigh the advantages of having the
way it currently is. Furthermore, I think Randy's idea has the merit of
echoing the approach taken by the existing *_IO packages. Why don't we have
something like this:
type Count is range 0 .. [imp def];
subtype Positive_Count is Count range 1..Count'Last;
declared in Ada.Containers, and then:
generic
type Element_Type is private;
with function "=" (Left, Right : Element_Type)
return Boolean is <>;
package Ada.Containers.Vectors is
pragma Preelaborate;
type Vector_Type is private;
function "=" (Left, Right : Vector_Type) return Boolean;
function Max_Length (Vector : Vector_Type) return Count; -- was Length
function Is_Empty (Vector : Vector_Type) return Boolean;
procedure Clear (Vector : in out Vector_Type);
procedure Swap (Left, Right : in out Vector_Type);
procedure Append (Vector : in out Vector_Type;
New_Item : in Element_Type);
procedure Insert (Vector : in out Vector_Type;
Before : in Positive_Count;
New_Item : in Element_Type);
procedure Insert (Vector : in out Vector_Type;
Before : in Positive_Count);
procedure Insert_N (Vector : in out Vector_Type;
Before : in Positive_Count;
How_Many : in Count;
New_Item : in Element_Type);
...
function Length (Vector : Vector_Type) return Natural; -- was Size
procedure Resize (Vector : in out Vector_Type;
New_Length : in Count);
-- function Front, Back ?
function First (Vector : Vector_Type) return Positive_Count;
...
If the user felt it was important to have index type safety, or an index
base other than 1 -- and I don't think it will be often -- she could always
wrap an instantiation of Ada.Containers.Vectors in a package that provided it.
I could suggest a few more useful operations for vectors. How about vector
concatenation? Slicing?
I might suggest a constant Null_Vector, obviating the need for the Is_Empty
function and Clear procedure, but I must admit one disadvantage of such
constants is that they are not inherited. I've found this a small pain
occasionally. On the other hand, the test V = Foo.Null_Vector might be
considered better (more natural, more readable) than Is_Empty(V) and V :=
Foo.Null_Vector than Clear(V). But personally I'm not sure.
I'm none too keen on the
generic
type Element_Access is access all Element_Type;
function Generic_Element (Vector : Vector_Type;
Index : Index_Type'Base)
return Element_Access;
sub-package. It will surely constrain the implementation to declaring its
internal storage array(s) with aliased components. This could have some
pretty unfortunate effects on efficiency.
I really like the Generic_Sort. That would certainly be very handy.
By the way, I wonder if anyone has thought about a likely implementation of
this package. I know Matt's done a sample imp (which I haven't had time to
look at, sorry), but it seems to me that a reasonably efficient
implementation would not be very simple. Are we saying that implementations
are not expected to be very efficient, or that implementations are expected
to be sophisticated?
Another suggestion that I feel you should think about is a package that has
almost the same interface as A.C.Vectors, but whose container objects are
capable of being metamorphosed (perhaps implicitly, perhaps explicitly, or
perhaps both) between the array form (with fast random access) and the
linked-list form (with efficient appendage). This would fit very neatly
with typical usage: building by successively appending elements, followed
by usage that requires random access (sorting being the classic example).
In the light of this idea, might not a List (linked list) package actually
be more fundamentally useful, that simply had an operation to convert the
list to an array?
****************************************************************
From: Matthew Heaney
Sent: Wednesday, February 18, 2004 1:21 PM
> If the user felt it was important to have index type safety, or an index
> base other than 1 -- and I don't think it will be often -- she could
> always wrap an instantiation of Ada.Containers.Vectors in a package that
> provided it.
The vector package will import a generic formal index type.
> I could suggest a few more useful operations for vectors. How about
> vector concatenation? Slicing?
This is an open issue, and I mentioned this in the errata list I sent
Randy this morning.
> I might suggest a constant Null_Vector, obviating the need for the
> Is_Empty function and Clear procedure, but I must admit one disadvantage
> of such constants is that they are not inherited. I've found this a
> small pain occasionally. On the other hand, the test V = Foo.Null_Vector
> might be considered better (more natural, more readable) than
> Is_Empty(V) and V := Foo.Null_Vector than Clear(V). But personally I'm
> not sure.
The vector will have Is_Empty and Clear operations.
> I'm none too keen on the
>
> generic
> type Element_Access is access all Element_Type;
> function Generic_Element (Vector : Vector_Type;
> Index : Index_Type'Base)
> return Element_Access;
>
> sub-package. It will surely constrain the implementation to declaring
> its internal storage array(s) with aliased components. This could have
> some pretty unfortunate effects on efficiency.
The aliasing of elements is an open issue (for other reasons), and was
included in the errata list I sent Randy this morning.
> I really like the Generic_Sort. That would certainly be very handy.
>
> By the way, I wonder if anyone has thought about a likely implementation
> of this package. I know Matt's done a sample imp (which I haven't had
> time to look at, sorry), but it seems to me that a reasonably efficient
> implementation would not be very simple. Are we saying that
> implementations are not expected to be very efficient, or that
> implementations are expected to be sophisticated?
It's implemented using an unconstrained array (that's why the container
is named "vector"). The implementation is as complicated as array
manipulation is.
The Generic_Sort in the reference implementation is implemented using a
quicksort algorithm, augmented with a median-of-3 to find the pivot.
> Another suggestion that I feel you should think about is a package that
> has almost the same interface as A.C.Vectors, but whose container
> objects are capable of being metamorphosed (perhaps implicitly, perhaps
> explicitly, or perhaps both) between the array form (with fast random
> access) and the linked-list form (with efficient appendage).
The vector is optimized for inserting at the back end of the container.
Append for a vector is O(1), just like a list is. (The only
difference is that appending to a vector is "amortized" constant time.)
> This would
> fit very neatly with typical usage: building by successively appending
> elements, followed by usage that requires random access (sorting being
> the classic example).
That's exactly how a vector is intended to be used. You do not need a
list to do what you have described.
> In the light of this idea, might not a List
> (linked list) package actually be more fundamentally useful, that simply
> had an operation to convert the list to an array?
There is no list container is this version of the standard container
library.
****************************************************************
From: Marius Amado Alves
Sent: Wednesday, February 18, 2004 2:44 PM
On Wednesday 18 February 2004 18:22, Nick Roberts wrote:
> I would repeat (I'm sure I've said it before many times) that the container
> packages /do not need/ indefinite forms, now or in the future.
>
> The reason is simple:
>
> (a) if you want to contain an indefinite type, and you want to abstract
> away such low-level mechanics as memory management (quite rightly), all you
> do is write a package that exports a definite private type, with the
> required operations and other accoutrements (constants, support types and
> subtypes), and encapsulates the underlying indefinite type indide that
> definite type (almost certainly by using dynamic allocation);
Beaten argument. And self contradictory: dynamic allocation *is* memory
management. The "level" of it does not matter. The user does not want to do
*any* memory management.
> (b) to support class-wide types or any indefinite types whose objects are
> not dynamically allocated (so that memory management is not an issue), you
> can contain an access type that designates them.
Sure.
> For strings, Ada.Strings.Unbounded is a perfect example of (a). You can use
> definite containers on unbounded strings without problems.
Unbounded_String is in fact a wonderful container. And a paradigmatic example
of what the user expects from any container. So more ammo against (a).
> End of story, and hopefully end of argument.
Unfortunately no.
> Randy suggested a semi-global Size_Type declared in Ada.Containers. Bob D
> reckoned this was good for maps and sets, but not vectors. I still disagree
> with Bob about the vector package having its own Index_Type generic
> parameter. I think that the practical advantages of having a pre-supplied
> universal index type would greatly outweigh the advantages of having the
> way it currently is.
I agree. I'm taking the chance to express myself on this issue. For me the
index type could be simply Positive, like in Unbounded_Arrays (a package I
presented at the ASCL Workshop, echoes of which can still be heard in the
current proposal e.g. Resize and unassigned elements).
/*
> Furthermore, I think Randy's idea has the merit of
> echoing the approach taken by the existing *_IO packages.
I never liked this _Count business but ok.
*/
> I might suggest a constant Null_Vector...
No please.
> . . . the test V = Foo.Null_Vector might be
> considered better (more natural, more readable) than Is_Empty(V) and V :=
> Foo.Null_Vector than Clear(V).
Not to me, no.
> I'm none too keen on the
>
> generic
> type Element_Access is access all Element_Type;
> function Generic_Element (Vector : Vector_Type;
> Index : Index_Type'Base)
> return Element_Access;
>
> sub-package. It will surely constrain the implementation to declaring its
> internal storage array(s) with aliased components. This could have some
> pretty unfortunate effects on efficiency.
And it's not terribly useful either. If the user wants to do pointer
programming he can do that him self with containers of pointers, no?
> . . .
> Another suggestion that I feel you should think about is a package that has
> almost the same interface as A.C.Vectors, but whose container objects are
> capable of being metamorphosed
If it's another package with a similar interface then just make it a list,
don't complicate it with transmorphing. I tried a similar stunt with Truc but
then I saw the light :-)
> ... linked-list form (with efficient appendage)
You mean insertion. Appendage can be efficient with vectors.
> In the light of this idea, might not a List (linked list) package actually
> be more fundamentally useful, that simply had an operation to convert the
> list to an array?
Maybe. Personally I wouldn't mind at all seeing a list package there.
Paralelled by a reduction of the vectors interface. Once you have lists, you
don't need the (unefficient) insertion and deletion in the middle of vectors
anymore. And as said above, remove pointer programming support--in all
structural varieties (vectors, lists, maps, sets). The total reduction would
make plenty of space for the so much wanted--and rightfully so--list.
* Indefinite elements revisited : an alternative : elementary containers *
I think we all agree that the main rationale for having indefinite elements is
freeing the user to do memory management. Many people do not like, want, or
know how, to dance with pointers.
I and Matt have already shown how indefinite elements can be added to the
proposal, with packages paralleling the ones for definite elements, defined
in a one-page annex.
An alternative is to provide a minimal package of 'elementary containers' that
does the required encapsulation of an indefinite inside a definite that the
user can then use to instantiate 'normal' containers. This alternative has
the virtue of focusing on the main requirement (freeing the user of doing
memory management).
generic
type Element_Type (<>) is private;
package Ada.Containers.Elementary is
type Container_Type is private;
function Put (Item : Element_Type) return Container_Type;
function Get (Container : Container_Type) return Element_Type;
end;
package Boxes is new Elementary (My_Indef_Type);
package My_Vectors is Vectors (Boxes);
use Boxes, My_Vectors;
V : My_Vectors.Vector_Type;
Append (V, Put (My_Indef_Object));
My_Op_Upon_The_Indef_Type (Get (Element (V, 1)));
For a 'real' example see the implementation of Truc
(www.liacc.up.pt/~maa/containers).
This breaks the only-one-instantiation requirement but it is for a good cause
:-)
Personally I'd be quite happy with this solution. And I'm a REALLY BIG fan of
indefinite elements, so we can safely assume all the others will be happy
too, and the standard will be embraced by ALL :-)
Note the minimal container is useful also for other situations, e.g. for
making an (core language) array of indefinite elements:
A : array (1 .. 10) of Boxes.Container_Type;
A (1) := Put (My_Indef_Object);
And remember you have memory magic i.e. when you write
A (1) := Put (Another_Indef_Object);
the previous value is cleanly disposed of.
Compare this with all the stuff you have to write (and review, and debug, and
test, and...) to get the same effect with core language devices. (Well, this
is just backing up the rationale above.)
****************************************************************
From: Randy Brukardt
Sent: Wednesday, February 18, 2004 5:55 PM
> > I'm none too keen on the
> >
> > generic
> > type Element_Access is access all Element_Type;
> > function Generic_Element (Vector : Vector_Type;
> > Index : Index_Type'Base)
> > return Element_Access;
> >
> > sub-package. It will surely constrain the implementation to declaring its
> > internal storage array(s) with aliased components. This could have some
> > pretty unfortunate effects on efficiency.
>
> And it's not terribly useful either. If the user wants to do pointer
> programming he can do that him self with containers of pointers, no?
I think the idea is to allow update-in-place of elements (which matters if
the elements are large or indefinite). It's likely to be more necessary with
Maps than with Vectors, but it's better to have the same operations for all
of the containers.
It wouldn't be necessary to use a generic formal for this purpose, of
course, just put an access type in here:
type Element_Access is access all Element_Type;
function Writable_Element (Vector : Vector_Type;
Index : Index_Type'Base)
return Element_Access;
That's a bit less flexible, but probably flexible enough if the primary
purpose is a reference.
...
> An alternative is to provide a minimal package of 'elementary containers' that
> does the required encapsulation of an indefinite inside a definite that the
> user can then use to instantiate 'normal' containers. This alternative has
> the virtue of focusing on the main requirement (freeing the user of doing
> memory management).
I tend to prefer the two packages mechanism. That's because having the local
memory management also makes the proportionality constant for Inserts and
Sorts much less, and I'd not want to lose that.
Indeed, if the proposal was adopted with both Definite and Indefinite
element types, I'd suggest using the Indefinite version for
large/expensive-to-copy element types even if the type is definite and any
amount of Insert/Delete/Sorting will be done. (For Janus/Ada, the two
implementations would be identical, but that would be unusual, and I
wouldn't recommend anyone depend on that.) The Definite version would be
best for small element types (like access types), because it would have a
lot less overhead for adding an item and destroying the container.
****************************************************************
From: Nick Roberts
Sent: Wednesday, February 18, 2004 4:35 PM
Marius Amado Alves wrote:
> Personally I wouldn't mind at all seeing a list package there.
Indeed, and I feel the argument for a list package is really stronger than
for a vectors one. With a list container, you can do all the insertion and
deletion you like perfectly efficiently, and then just convert it to an
array for random access. What's wrong with that? Why then would vectors be
needed at all?
> Many people do not like, want, or know how, to dance with pointers.
I completely agree with this.
> I and Matt have already shown how indefinite elements can be added to the
> proposal, with packages paralleling the ones for definite elements, defined
> in a one-page annex.
Yuk.
> An alternative is to provide a minimal package of 'elementary containers' that
> does the required encapsulation of an indefinite inside a definite that the
> user can then use to instantiate 'normal' containers. This alternative has
> the virtue of focusing on the main requirement (freeing the user of doing
> memory management).
Brilliant! I think this is a superb idea. Maybe we could term a container
of this kind a 'keeper'; I'm sure someone can come up with a better one.
with Ada.Finalization; -- for private part only
generic
type Element_Type (<>) is private;
package Ada.Containers.Keepers is
type Keeper is private;
function To_Keeper (Item : Element_Type) return Keeper;
function Empty_Keeper return Keeper;
function Value (Source : Keeper) return Element_Type;
function Is_Empty (Source : Keeper) return Boolean;
procedure Clear (Source : in out Keeper);
procedure Replace (Source : in out Keeper;
By : in Element_Type);
private
type Element_Access is access Element_Type;
type Keeper is new Ada.Finalization.Controlled with
record
Ref: Element_Access; -- null for empty
end record;
end;
package My_Keepers is new Ada.Containers.Keepers(My_Indef_Type);
package My_Vectors is Ada.Containers.Vectors(My_Keepers.Keeper);
use My_Keepers, My_Vectors;
V : My_Vectors.Vector_Type;
Append( V, To_Keeper(My_Indef_Object) );
My_Op_Upon_The_Indef_Type( Value( Element(V,1) ) );
Possibly 'To_Keeper' should be named 'Make_Keeper' or 'New_Keeper'. I've
shown the likely implementation of the Keeper type.
> Personally I'd be quite happy with this solution. And I'm a REALLY BIG fan of
> indefinite elements, so we can safely assume all the others will be happy
> too, and the standard will be embraced by ALL :-)
I really REALLY like Marius' idea here. Yes please!
> Note the minimal container is useful also for other situations, e.g. for
> making an (core language) array of indefinite elements:
>
> A : array (1 .. 10) of Boxes.Container_Type;
> A (1) := Put (My_Indef_Object);
or alternatively:
A : array (1 .. 10) of My_Keepers.Keeper;
Replace( A(1), My_Indef_Object );
which might be slightly more efficient.
****************************************************************
From: Randy Brukardt
Sent: Wednesday, February 18, 2004 7:32 PM
...
> Indeed, and I feel the argument for a list package is really stronger than
> for a vectors one. With a list container, you can do all the insertion and
> deletion you like perfectly efficiently, and then just convert it to an
> array for random access. What's wrong with that? Why then would vectors be
> needed at all?
That's going to be very expensive if the length of the list is very long
and/or copying the elements is expensive. Matt's design tries to avoid
copying elements as much as possible, and he's particularly concerned with
the containers being able to 'scale-up' to large numbers of elements.
If the sequence (I'm using the general term here) doesn't have very big
elements and can't get very long, you don't need any fancy container to hold
it. Just declare an array of the maximum size and use it.
The value of any container is when one or both of those things is true, and
you do need the memory management implied by a container. And, if you can
only have one sequence container, the vector container (which allows
computed access to elements) is more flexible than the list container (which
doesn't). Besides, a useful list is a lot easier to write than a useful
growable array.
****************************************************************
From: Marius Amado Alves
Sent: Thursday, February 19, 2004 6:51 AM
On Wednesday 18 February 2004 22:34, Nick Roberts wrote:
[Lists and Vectors]
> Marius Amado Alves wrote:
> > Personally I wouldn't mind at all seeing a list package there.
>
> Indeed, and I feel the argument for a list package is really stronger than
> for a vectors one.
I don't feel that way.
> With a list container, you can do all the insertion and
> deletion you like perfectly efficiently, and then just convert it to an
> array for random access. What's wrong with that?
Efficiency. Surely you cannot convert a list of a zillion elements just like
that.
> Why then would vectors be
> needed at all?
See above. And also, you often need the precise vector abstraction. Let it be
there ready for use. Just add the precise list abstraction. They will live
there happily side by side.
[Elementary Containers]
> generic
> type Element_Type (<>) is private;
> package Ada.Containers.Keepers is
> type Keeper is private;
> function To_Keeper (Item : Element_Type) return Keeper;
> function Empty_Keeper return Keeper;
> function Value (Source : Keeper) return Element_Type;
> function Is_Empty (Source : Keeper) return Boolean;
> procedure Clear (Source : in out Keeper);
> procedure Replace (Source : in out Keeper;
> By : in Element_Type);
> private...
Looks good. Compare with this 'real code' example from AI302/2:
generic
type Element (<>) is private;
type Element_Ptr is access all Element;
type Container is private;
with procedure Put (C : in out Container; E : Element) is <>;
with function Put (E : Element) return Container is <>;
with function Get (C : Container) return Element is <>;
with procedure Delete (C : in out Container) is <>;
with function Access_Of (C : Container) return Element_Ptr is <>;
with function "=" (L, R : Container) return Boolean is <>;
with procedure Overwrite (C : Container; E : Element) is <>;
with function Img (C : Container) return String is <>;
package Signature is end;
Operations side by side:
Minimal AI302/2 Nick Remark
------------------------------------------------------------------
Put(E)->C Put(E)->C To_Keeper(E)->C yes, Insert
Get(C)->E Get(C)->E Value(C)->E yes, Element
Put(ioE,C) Replace(ioC,E) yes, Replace(C,E)?
Delete(ioC) Clear(C) yes, Clear
Access_Of(C)->P for update-in-place
"="(C,C)->B no
Overwrite(C,E) for update-in-place
Img(C)->S no
Empty_Keeper->C no
------------------------------------------------------------------
Abbreviations:
E = element type
C = container type
-> = returns
io = in out
B = Boolean
P = pointer to element
In the remarks:
A "yes" means the operation is definitely a go, with the indicated name for
consistency with AI302/3.
The remark "Replace(C,E)?" is associated with the fact that in AI302/3 the
container parameter of the Replace_Element operation for vectors is just in,
not in out. But in the corresponding operation for maps the container
parameter is in out. Only the ARG and/or Matt can explain this.
The two "for update-in-place" operations:
Access_Of is like the Generic_Element (terrible name) of AI302/3 vectors.
Overwrite(C,E) is logically equivalent to Access_Of (C).all := E.
Overwrite is the update-in-place operation distiled. So if Access_Of (or
Generic_Element) is there just for update-in-place it can be dropped from the
interface.
In C++ Overwrite is dangerous if the new element is bigger than the previous.
I hope Ada can avert this, or at least detect it and raise an exception.
Whatever you do, leave a means for update-in-place in the interface. Albeit
dangerous (?), it is very useful for efficient replacement when the user
knows that the sizes are equal.
* Names. Finalising a proposal *
"Keeper" is too colloquial, no? And has a connotation to football. "Cell"
would be a better metaphor. Of course the container type name and the other
names must get along with each other e.g.
package container type element type
------------------------------------------
Elementary Container_Type Element_Type
Cells Cell_Type Element_Type
Cells Cell_Type Value_Type
------------------------------------------
If there are no essential disagreements with this proposal, I and Nick (?)
will try to formalise a proposal, with the options indicated above.
****************************************************************
From: Marius Amado Alves
Sent: Thursday, February 19, 2004 8:25 AM
On Wednesday 18 February 2004 23:48, Randy Brukardt wrote:
[Operations for update-in-place]
> Marius Amado Alves wrote (responding to Nick Roberts):
> > > I'm none too keen on the
> > >
> > > generic
> > > type Element_Access is access all Element_Type;
> > > function Generic_Element (Vector : Vector_Type;
> > > Index : Index_Type'Base)
> > > return Element_Access;
> > >
> > > sub-package. It will surely constrain the implementation to declaring
>
> its
>
> > > internal storage array(s) with aliased components. This could have some
> > > pretty unfortunate effects on efficiency.
> >
> > And it's not terribly useful either. If the user wants to do pointer
> > programming he can do that him self with containers of pointers, no?
>
> I think the idea is to allow update-in-place of elements (which matters if
> the elements are large or indefinite).
If large yes. If indefinite not quite. You have to deal with possibly
different sizes. See my previous message in reply to Nick-
> It's likely to be more necessary
> with Maps than with Vectors,
I don't see why, but ok.
> but it's better to have the same operations
> for all of the containers.
Ok.
> It wouldn't be necessary to use a generic formal for this purpose, of
> course, just put an access type in here:
> type Element_Access is access all Element_Type;
Yes, please do that. The generic breaches the only-one-instantiation
requirement.
[Indefinite elements]
> I tend to prefer the two packages mechanism. That's because having the
> local memory management also makes the proportionality constant for Inserts
> and Sorts much less
If I understand correctly, not quite. Not the "much" anyway. See the
provisions for update-in-place for elementary containers in my previous
message (in reply to Nick).
> Indeed, if the proposal was adopted with both Definite and Indefinite
> element types, I'd suggest using the Indefinite version for
> large/expensive-to-copy element types even if the type is definite and any
> amount of Insert/Delete/Sorting will be done. (For Janus/Ada, the two
> implementations would be identical, but that would be unusual, and I
> wouldn't recommend anyone depend on that.) The Definite version would be
> best for small element types (like access types), because it would have a
> lot less overhead for adding an item and destroying the container.
Note this only applies to *inerently* inefficient operations e.g.
inserting/deleting in vectors. And, again, provisions for update-in-place for
elementary containers minimize the 'problem'.
And shouldn't we avoid mingling definiteness and largeness? They are
independent factors.
Personally, as a user, I'm happy with either solution (Annex <IE> or
elementary containers). I can easily construct either one from the other.
But as an implementer I would prefer the elementary containers solution,
because it is so less trouble. I'm surprised that the real compiler writer
Randy feels the contrary.
And it seems much less work for conformance testing also.
And it probably eases the specification also. Annex <IE> is a bit strange and
bug-prone, because it is assuming that a lot about definite elements
transposes to indefinite. We already found some "anomalies". Elementary
containers is just a 'normal' spec. It does not require any *combined*
testing with the other containers. The user can easily derive by himself any
theorems about a container of elementary containers from the two independent
specs.
And I think everybody prefers a standard that just shows a package spec--over
one that defines one in English.
****************************************************************
From: Randy Brukardt
Sent: Thursday, February 19, 2004 6:21 PM
Marius Amado Alves:
> > I think the idea is to allow update-in-place of elements (which matters
if
> > the elements are large or indefinite).
>
> If large yes. If indefinite not quite. You have to deal with possibly
> different sizes.
Well, usually it would be used to update parts (components) of elements, not
the entire thing. If you're going to update the whole thing, use the safer
Replace_Element. Indefinite elements have components, too.
> [Indefinite elements]
>
> > I tend to prefer the two packages mechanism. That's because having the
> > local memory management also makes the proportionality constant for
Inserts
> > and Sorts much less
>
> If I understand correctly, not quite. Not the "much" anyway. See the
> provisions for update-in-place for elementary containers in my previous
> message (in reply to Nick).
For most implementations, it will make them much less. The canonical
implementation of a definite element is:
type Internal_Array is array (Index_Type range <>) of aliased
Element_Type;
while for indefinite element is:
type Element_Access is access all Element_Type;
type Internal_Array is array (Index_Type range <>) of Element_Access;
so, when you're moving buckets for an insert, you're copying whole elements
in the definite case, and just pointers in the indefinite case. If element
copy is expensive (lots of controlled components, for instance), that can
make a huge difference.
> Note this only applies to *inerently* inefficient operations e.g.
> inserting/deleting in vectors.
Of course. But if you're using them a lot, it matters.
> And, again, provisions for update-in-place for
> elementary containers minimize the 'problem'.
I have no idea what you mean. When you have to copy an element, you have to
copy it. If "elementary containers" (BTW, that name is horrible, because
"elementary" means scalar and access types in Ada, and that is not what you
mean here) uses controlled types and does reference counted shallow copies,
it could avoid some overhead -- but at the cost of a lot of complexity.
> But as an implementer I would prefer the elementary containers solution,
> because it is so less trouble. I'm surprised that the real compiler writer
> Randy feels the contrary.
For us (because of generic sharing), there is no difference between definite
and indefinite elements. The compiler will internally transform
"Element_Type" into "Element_Access" (because the size and contents of the
actual type are unknown). Which is why I'm completely opposed to any
semantics differences between them.
And, because of that, your proposed solution would mean that both containers
would end up doing memory management. So everything would end up allocated
twice (the actual element, and then the "elementary container". That would
cause serious heap fragmentation problems (Windows is not good at handling
that), and I fear that the combination would be effectively unusable. At
which point we're out of business (changing the implementation of generics
is not an option).
For me, all of the elements should be indefinite, period. We don't need
definite versions. (That would make Janus/Ada look good, our implementation
would be competitive. :-) But I understand why no one else thinks that.
> And it seems much less work for conformance testing also.
Since the semantics are identical for the two packages, use the same tests
(with different types). Much less work than writing two sets of tests from
scratch.
> And it probably eases the specification also. Annex <IE> is a bit strange
and
> bug-prone, because it is assuming that a lot about definite elements
> transposes to indefinite. We already found some "anomalies".
Yes, but those are bugs in the design of the container. Do we really want to
be able to put random junk into containers? I don't think so.
There would be a problem if we were to decide to add array operations (since
indefinite can't be a component), but that's far from decided.
...
> And I think everybody prefers a standard that just shows a package
spec--over
> one that defines one in English.
That is precisely how all of the Wide_String packages work, and they haven't
caused a lot of problems. Indeed, the advantage of the indefinite packages
is that they are *very* small in terms of standard wording and "weight"
(that is, there is no new concepts to learn and understand with them).
That's not true of "elementary containers".
****************************************************************
From: Jeffrey Carter
Sent: Thursday, February 19, 2004 7:07 PM
Randy Brukardt wrote:
> Of course. But to me, a hash table is just a table (array); collision
> handling is not part of it. It's a necessary part of a component, of
> course, which is why it's impossible to have a hash table component.
OK. That's not the definition of a hash table that I learned, but we're
not really in disagreement. I'm curious, though: if a hash table is just
an array, what are the index and component types?
> Which of course is exactly the argument I've been making all along.
> Of course, then the Sorted_Set and the Vector are also good enough --
> which is quite contrary to your position.
I'd be perfectly happy to not have a hash table or anything based on
one. If they exist, though, I might choose to use a hash table based on
expected performance for a specific application, and I would want to be
able to use it without an ugly kludge. If they exist, I think the
implementation should be available as well as the higher-level components.
****************************************************************
From: Stephen Leake
Sent: Friday, February 20, 2004 3:34 AM
Matthew Heaney <mheaney@on2.com> writes:
> In the case of the STL, what happens it that you specify an iterator
> pair designating the half-open range of the source container. The
> vector probably computes the distance() first, then does the internal
> expansion, and then walks the source range constructing each new
> vector element in place.
>
> For a std::vector, the distance() function is specialized so that it
> computes the distance in constant time (because vector iterators are
> random access iterators, and therefore distance() can be implementing
> for a vector by simple subtraction).
>
> We can't get this sophisticated in Ada, but we can be almost as
> efficient. Instead of the vector itself calling distance(), it's the
> vector user who computes the distance (by whatever method makes
> sense), and then calls Insert_N to do the preallocation.
Hmm. We could require a source container signature package, that
includes cursors and Distance; that should give the same efficiency as
C++ STL. We probably don't want that for ai302.
...
> > So I'm affirming that deleting the itemless insertion from the
> > indefinite map is ok.
>
> I think they need to stay. If nothing else the definite and
> indefinite forms require a more or less identical interface.
Ok, I agree with you; itemless insert is useful and should be in the
indefinite containers.
However, the intended use is that they be immediatly followed by a
Replace operation, which specifies the item for each element. So
itemless insert should just insert null pointers in the underlying
container, and any operation that accesses an itemless element should
raise Constraint_Error, since it indicates a user error.
I've looked thru your indefinite_vectors package. Why do you have both
type VT and type Vector_Type?
****************************************************************
From: Matthew Heaney
Sent: Friday, February 20, 2004 1:50 PM
> Hmm. We could require a source container signature package, that
> includes cursors and Distance; that should give the same efficiency as
> C++ STL. We probably don't want that for ai302.
It's not necessary. In the current design it just means you have to
supply the count yourself and do the vector pre-insert, then use your
favorite iteration method (over the target, over the source, active,
passive, etc, etc) to do the actual vector insert.
> Ok, I agree with you; itemless insert is useful and should be in the
> indefinite containers.
>
> However, the intended use is that they be immediatly followed by a
> Replace operation, which specifies the item for each element.
Yes, that is correct. The state of the container immediately following
the pre-insert (what I've been calling "Insert_N") is intended only as a
temporary state, as a prelude to some form of replacement of the
elements in the newly-allocated slots.
> So itemless insert should just insert null pointers in the underlying
> container, and any operation that accesses an itemless element should
> raise Constraint_Error, since it indicates a user error.
Yes, for the indefinite form, an item-less insert would give each new
slot the value null (the original non-null values in those positions
would slide up), in anticipation of its replacement by a non-null value.
> I've looked thru your indefinite_vectors package. Why do you have both
> type VT and type Vector_Type?
It's a bit of a trick. I used transitivity of visibility to make the
operations of the type directly visible.
****************************************************************
From: Marius Amado Alves
Sent: Friday, February 20, 2004 5:51 AM
On Friday 20 February 2004 00:20, Randy Brukardt wrote:
> ... When you have to copy an element, you have to
> copy it. If "elementary containers" (BTW, that name is horrible...
The correct name would be "uni-elementary containers". For some reason I lost
the "uni-". I'm considering changing to "cells".
> ... everything would end up
> allocated twice (the actual element, and then the "elementary container".
> That would cause serious heap fragmentation problems (Windows is not good
> at handling that), and I fear that the combination would be effectively
> unusable.
Serious problems? Effectively unusable? Are you sure? Just because of one more
level of allocation? For such small things as pointers? Forgot high
performance is not required?
> ... For me, all of the elements should be indefinite, period.
For me too!
> ... But I understand why no one else thinks that.
I don't (understand)!
> > ... one that defines one in English.
>
> That is precisely how all of the Wide_String packages work, and they
> haven't caused a lot of problems.
I know, but String to Wide_String is not a quantum leap like definite to
indefinite.
> Indeed, the advantage of the indefinite
> packages is that they are *very* small in terms of standard wording and
> "weight" (that is, there is no new concepts to learn and understand with
> them).
Only if the transposition is exceptionless. That is no "anomalies". Can we
assure that? I fear a flood of Ada Questions (?) beginning 2005.
> That's not true of "elementary containers".
Yes, but the new concept (the cell:-) is minimal, useful, "brilliant", natural
to every programmer.
In sum, we have three solutions to choose from, with pros and cons:
Only Def.+ Def.+
indef. indef. cells
------------------------------------------------------
Changes to AI-302/3 many few few
Reference implementation no yes yes
One more useful structure no no yes
Janus issues no no yes
...
****************************************************************
From: Randy Brukardt
Sent: Friday, February 20, 2004 7:32 PM
Marius Amado Alves wrote:
...
> The correct name would be "uni-elementary containers". For some reason I
lost
> the "uni-". I'm considering changing to "cells".
"Cells" seems better to me. Short is always good!
> > ... everything would end up
> > allocated twice (the actual element, and then the "elementary
container".
> > That would cause serious heap fragmentation problems (Windows is not
good
> > at handling that), and I fear that the combination would be effectively
> > unusable.
>
> Serious problems? Effectively unusable? Are you sure? Just because of one
more
> level of allocation? For such small things as pointers? Forgot high
> performance is not required?
Well, a "Cell" (which is doing memory management) is not a pointer, it's a
controlled object containing a pointer. (Otherwise, the memory wouldn't be
recovered on scope exit, which is a clear no-no.) So that means its size is
more like 20 bytes for Janus/Ada. So I think I was wrong about fragmentation
problems (it is big enough to avoid those). But it certainly would be a
potential problem for memory use (if there are lot of them), and a lot more
overhead when items are copied (calls to Finalize and Adjust for each item,
which the directly indefinite version would not have - it wouldn't need
controlled elements as the container itself is controlled). Of course, this
doesn't matter in truly low performance applications, but there are a lot of
middle ground applications in which that could matter.
> > ... But I understand why no one else thinks that.
>
> I don't (understand)!
Bounded forms need to have definite components (the reason for bounded forms
is to have little or no dynamic memory management; it defeats the purpose to
then dynamically allocate the elements). We need to leave room for future
enhancements. Similarly, there is a lot less dynamic memory management in
with definite elements. Most implementers claim that's important to their
customers (they want repeatability). (It better not be important to
Janus/Ada customers, because we allocate a lot of things dynamically and
non-contiguously.) I have to trust their judgement.
...
> > Indeed, the advantage of the indefinite
> > packages is that they are *very* small in terms of standard wording and
> > "weight" (that is, there is no new concepts to learn and understand with
them).
>
> Only if the transposition is exceptionless. That is no "anomalies". Can we
> assure that? I fear a flood of Ada Questions (?) beginning 2005.
Well, I'm not worrying that the ARG is going to run out of work no matter
what ends up in the Amendment. I fully expect a flood of questions on the
containers. Almost all of the packages in Ada 95 (except the ones defined in
previous standards) generated a lot of questions. Why would this Amendment
be different??
> Yes, but the new concept (the cell:-) is minimal, useful, "brilliant",
natural
> to every programmer.
One more minor advantage to indefinite element containers: they only require
one instantiation to use. The "cell" solution requires two.
****************************************************************
From: Nick Roberts
Sent: Saturday, February 21, 2004 12:54 PM
> "Cells" seems better to me. Short is always good!
I like that name too. Splendid.
> Well, a "Cell" (which is doing memory management) is not a pointer, it's
> a controlled object containing a pointer. (Otherwise, the memory
> wouldn't be recovered on scope exit, which is a clear no-no.) So that
> means its size is more like 20 bytes for Janus/Ada. So I think I was
> wrong about fragmentation problems (it is big enough to avoid those).
> But it certainly would be a potential problem for memory use (if there
> are lot of them), and a lot more overhead when items are copied (calls
> to Finalize and Adjust for each item, which the directly indefinite
> version would not have - it wouldn't need controlled elements as the
> container itself is controlled). Of course, this doesn't matter in truly
> low performance applications, but there are a lot of middle ground
> applications in which that could matter.
It may or may not be a problem for memory use. The size of one 'cell'
object would, as you say, comprise in the ball park of 20 bytes (a tag and
a linked-list 'next' access value, in addition the access value referring
to the contained indefinite object). I reckon the overhead (the tag and the
next pointer) is likely to be 8 bytes in most cases, although it could be
quite a lot more. However, if the average size of each contained object is
significantly more than this overhead, it is unlikely to be really
significant (it may be a little annoying). If the inidefinite objects are
relatively small on average, it matters. I'm not really sure, myself, which
scenario will be prevalent in practice.
> One more minor advantage to indefinite element containers: they only
> require one instantiation to use. The "cell" solution requires two.
The extra instantiation could be somewhat amortised away in some (perhaps
many) realistic situations.
type Fragment_Count is range 0..2000;
subtype Fragment_Number is Fragment_Count range 1..Fragment_Count'Last;
package Gene_Fragments is new Ada.Containers.Cells(Gene_Array);
subtype Gene_Fragment is Gene_Fragments.Cell; use Gene_Fragments;
package Fragment_Gangs is
new Ada.Containers.Vectors(Fragment_Number,Gene_Fragment);
subtype Fragment_Gang is Fragment_Gangs.Vector; use Fragment_Gangs;
type Fixed_Gang is array (Fragment_Number range <>) of Gene_Fragment;
Sample: constant Fixed_Gang := Ref_Samp_1 & Ref_Samp_2;
Here, the instantiation of a cell package permits us to declare an array of
cells in addition to a vector of them. I feel that cells would quite often
be useful for purposes other than allowing a (definite) container to
contain indefinite objects.
An advantage of definite containers over their indefinite counterparts is
that they permit conversion to and from arrays (including the slicing of a
linear container). The cell technique would have the extra advantage that,
since only definite containers are used, these array operations would
remain available. I feel that in itself could be quite a compelling argument.
In my ignorance, could I ask please what the presumed (proper)
implementation of Vectors is?
In my mind forms a picture of a tree structure with the leaves containing
(or pointing to) actual arrays which form fragments of the whole conceptual
array. Each fragment would have a counter saying how many of its elements
are actually used. Appending an element would require adding a leaf node if
there was no more space in the end fragment. Random selection of an element
would require descending the tree. Am I way off the mark?
If I'm not way off the mark, I would contend that building a linked list
and converting to an array (for subsequent random access) would be likely
to be superior (to building a vector and selecting randomly from it by tree
descent) in a majority of cases in practice.
****************************************************************
From: Matthew Heaney
Sent: Monday, February 23, 2004 6:23 PM
> In my ignorance, could I ask please what the presumed (proper)
> implementation of Vectors is?
See the files
ai302-containers-vectors.ad?
ai302-containers-indefinite_vectors.ad?
in the latest reference implementation for the details.
<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040223.zip>
This implementation has a few more examples, some of which use the new
indefinite vector container. Look thru the anagram examples for some ideas.
> In my mind forms a picture of a tree structure with the leaves containing
> (or pointing to) actual arrays which form fragments of the whole conceptual
> array. Each fragment would have a counter saying how many of its elements
> are actually used. Appending an element would require adding a leaf node if
> there was no more space in the end fragment. Random selection of an element
> would require descending the tree. Am I way off the mark?
A vector is implemented as an unconstrained array.
> If I'm not way off the mark, I would contend that building a linked list
> and converting to an array (for subsequent random access) would be likely
> to be superior (to building a vector and selecting randomly from it by tree
> descent) in a majority of cases in practice.
To convert between container types, just use one of the iterators.
The container library takes pains to give the library user easy and
efficient access to the container elements (that means the actual
objects).
It is never the case that a container needs, say, an operation to
convert itself to an array specifically. A container iterator allows
the library user himself to choose the target type, whatever makes the
most sense for him.
****************************************************************
From: Matthew Heaney
Sent: Tuesday, February 24, 2004 10:27 AM
Nick Roberts wrote:
>
> I might suggest a constant Null_Vector, obviating the need for the
> Is_Empty function and Clear procedure, but I must admit one disadvantage
> of such constants is that they are not inherited. I've found this a
> small pain occasionally. On the other hand, the test V = Foo.Null_Vector
> might be considered better (more natural, more readable) than
> Is_Empty(V) and V := Foo.Null_Vector than Clear(V). But personally I'm
> not sure.
This won't work, because the vector type privately derives from
Controlled, and therefore you can't declare a constant of the type in a
package with preelaborate categorization.
However, a constructor function would work. Here are some ideas:
function Null_Vector return Vector_Type;
function Empty_Vector return Vector_Type;
function New_Vector return Vector_Type;
function To_Vector (Length : Size_Type) return Vector_Type;
function To_Vector (New_Item : Element_Type;
Count : Size_Type)
return Vector_Type;
I actually had a need for something like this in one of my examples.
It's kind of a pain that the language doesn't give you a default
constructor for a type that you can pass as a parameter. For example,
in C++ I can say:
container.insert(T());
where T() invokes the default ctor for the element type T.
Ada does let you do something like this, when constructing an aggregate:
type NT is new T with record
I : Integer;
end record;
Object : constant NT := (T with I => 42);
Here we're allowed to use T as the value of the parent part of NT, when
constructing an aggregate of type NT. But I can't use the type name as
the value of a parameter:
Insert (Container, New_Item => T); -- not legal Ada
I have to say something like:
Insert (Container, New_Item => New_T);
where New_T is a function that returns type T.
****************************************************************
From: Randy Brukardt
Sent: Tuesday, February 24, 2004 1:39 PM
> It's kind of a pain that the language doesn't give you a default
> constructor for a type that you can pass as a parameter.
..
> Here we're allowed to use T as the value of the parent part of NT, when
> constructing an aggregate of type NT. But I can't use the type name as
> the value of a parameter:
>
> Insert (Container, New_Item => T); -- not legal Ada
True, but Ada 200Y lets you say:
Insert (Container, New_Item => (<>));
which is a default-initialized aggregate. Which is what you want, right??
(See AI-287.)
We originally tried to use the type name here, but it led to all kinds of
problems, and it isn't providing any actual information, so we decided to
use the box "<>" instead.
So all you really want is an Ada 200Y compiler. :-)
****************************************************************
From: Gary Dismukes
Sent: Tuesday, February 24, 2004 3:13 PM
> This won't work, because the vector type privately derives from
> Controlled, and therefore you can't declare a constant of the type in a
> package with preelaborate categorization.
Not completely true. In Ada 200Y you can make a private type have
preelaborable initialization, in which case constants of the type
can be declared in preelaborable packages (see AI-161). Type
Ada.Finalization.Controlled (and Limited_Controlled) are defined
to have preelaborable initialization, though there's a restriction
that if a user-defined controlled type overrides Initialize then
the type doesn't have preelaborable initialization.
****************************************************************
From: Matthew Heaney
Sent: Tuesday, February 24, 2004 4:15 PM
OK. Thanks for the info.
The vector and (hashed) map containers don't override the Initialize
operation.
The (sorted) set does override Initialize. Let me see if I can get rid
of that.
It might not matter anyway, since we can use the new "(<>)" notation to
construct an anonymous instance of the type.
****************************************************************
From: Matthew Heaney
Sent: Tuesday, February 24, 2004 4:27 PM
I just got rid of the override of Initialize for the set. The full view
of Set_Type now looks like:
function New_Back return Node_Access;
type Set_Type is new Controlled with record
Tree : Tree_Type := (Back => New_Back, Length => 0);
end record;
The function New_Back does the allocation and initialization that I was
doing in Initialize.
I'll fold this change into the next release of the reference implementation.
****************************************************************
From: Matthew Heaney
Sent: Friday, February 27, 2004 12:29 PM
I just uploaded the latest version of the reference implementation:
<http://home.earthlink.net/~matthewjheaney/charles/ai302-20040227.zip>
This version includes indefinite forms for all containers. There are
also two more anagram examples, and a new genealogy example.
****************************************************************
From: Tucker Taft
Sent: Friday, February 27, 2004 4:07 PM
I had a couple of problems compiling this.
One problem is that you have two versions of package "String_Vectors",
one in the top-level dir, and one in the indefinite_vectors subdirectory.
You might want to delete the indefinite_vectors subdirectory, since it
is redundant with the ai302-containers-indefinite_vectors stuff, and
it is confusing because one uses "Natural" where the other uses "Size_Type."
The other problem I had was with your "Control_Type" in the
private part of indefinite_vectors/indefinite_vectors.ads. Again,
this is largely redundant with ai302-containers-indefinite_vectors.
But for what it is worth, the former one doesn't compile with
our latest compiler, because the type declaration:
type VT is new Rep_Types.Vector_Type;
fails with complaints about trying to add primitive operations
after a type is frozen. It is a bit subtle, but this type
declaration is in fact implicitly declaring additional operations
on "Control_Type" *after* Control_Type has been passed to a generic.
The solution I came up with was putting the declaration of
Control_Type into a nested package ("Inner") starting at the
declaration of Control_Type and ending after the generic
instantiation producing Rep_Types. Then the declaration
of VT is outside the (inner) package, meaning that the additional
operations it implicitly declares with parameters of type
Control_Type don't end up as primitives of Control_Type.
A corresponding change is needed in the body of Indefinite_Vectors.
In any case, ai302-containers-indefinite_vectors.ad? doesn't
have this problem -- you use a different approach.
I'll let you know about any other problems I encounter.
Very nice work, in any case!
****************************************************************
From: Matthew Heaney
Sent: Friday, February 27, 2004 4:43 PM
I wasn't sure whether I still needed the old indefinite_xxx subdirectories.
Those were originally created to show how to implement an indefinite
container as a thin layer on top of the official definite containers.
However, after I did that Randy suggested that having indefinite forms
as an official part of the library might be acceptable, so I went ahead
and implemented them, up in the parent directory.
I can either remove them entirely from the release, or move them off
into a deprecated subdirectory.
I suppose a README couldn't hurt, either...
> The other problem I had was with your "Control_Type" in the
> private part of indefinite_vectors/indefinite_vectors.ads.
...
OK. That's easy enough to fix. (I don't really need that derived type.
It was only declared to effect transitivity of visibility.)
> The solution I came up with was putting the declaration of
> Control_Type into a nested package ("Inner") starting at the
> declaration of Control_Type and ending after the generic
> instantiation producing Rep_Types. Then the declaration
> of VT is outside the (inner) package, meaning that the additional
> operations it implicitly declares with parameters of type
> Control_Type don't end up as primitives of Control_Type.
> A corresponding change is needed in the body of Indefinite_Vectors.
OK. Thanks for the tip.
> In any case, ai302-containers-indefinite_vectors.ad? doesn't
> have this problem -- you use a different approach.
Indeed. That version is implemented natively, not as a thin layer.
The versions in the parent directory are the only ones you really care
about. I can move those other versions to somewhere less confusing.
> I'll let you know about any other problems I encounter.
OK, thanks. I can fold any changes into the next release.
I'll be at the meeting in Phoenix, so we can discuss any other issues
you have.
> Very nice work, in any case!
Thanks. I was able to build the reference implementation from the spare
parts I had lying around for Charles, so it was a big job but not that big.
I was just thinking today that it would be nice to have a functional
insertion operation, like this:
--see wordcount.adb
declare
N : Natural renames Insert (Map'Access, Word, 0).all;
begin
N := N + 1;
end;
or like this:
--see genealogy.adb
declare
Roots : Set_Type renames Insert (Map'Access, Key => "---").all;
begin
...
This simulates what I can do in C++ using operator[]().
One way to declare it is:
function Insert
(Map : access Map_Type;
Key : String) return access Element_Type;
I was thinking the cursor selectors could be declared like this:
function To_Element
(Cursor : Cursor_Type) return access Element_Type;
function To_Key
(Cursor : Cursor_Type) return access constant Key_Type;
If functions could return an anonymous access type this would allow me
to get rid of the Generic_Element and Generic_Key functions.
Just some ideas...
****************************************************************
From: Dan Eilers
Sent: Saturday, February 28, 2004 2:14 PM
In ai302/test_sets.adb, on line 91, there is a call to
"find" that appears to be ambiguous, matching the find
declared in test_sets.adb on line 51, and the find
declared in integer_vectors.
****************************************************************
From: Adam Beneschan
Sent: Monday, March 1, 2004 6:33 PM
...
> fails with complaints about trying to add primitive operations
> after a type is frozen. It is a bit subtle, but this type
> declaration is in fact implicitly declaring additional operations
> on "Control_Type" *after* Control_Type has been passed to a generic.
Can this be right? Essentially the source is equivalent to:
generic ...
package Indefinite_Vectors is
private
type Control_Type is new Controlled with record ... end record;
package Rep_Types is
type Vector_Type is private;
procedure Append (Vector : in out Vector_Type;
New_Item : in Control_Type);
private ...
end Rep_Types;
type VT is new Rep_Types.Vector_Type;
end Indefinite_Vectors;
The derived type declaration causes a new inherited subprogram to be
declared implicitly:
procedure Append (Vector : in out VT;
New_Item : in Control_Type);
But as I read RM 3.2.3 and particularly 3.2.3(4), the derived
subprogram Append is a primitive subprogram of type VT, but *not* a
primitive subprogram of type Control_Type. So there shouldn't be an
error message about primitive subprograms being added after
Control_Type is frozen (even if there were some declaration that froze
Control_Type before the declaration of VT, which there isn't in my
reduced example).
Also, 3.9.2(13) makes "the explicit declaration of a primitive
subprogram of a tagged type" illegal after the type is frozen, but
this is not an explicit subprogram declaration.
So what did I miss?
****************************************************************
From: Randy Brukardt
Sent: Monday, March 1, 2004 6:57 PM
...
> But as I read RM 3.2.3 and particularly 3.2.3(4), the derived
> subprogram Append is a primitive subprogram of type VT, but *not* a
> primitive subprogram of type Control_Type.
Humm. This looks messy. Primitive subprograms have to be explicitly declared
for initial types. But 3.2.3(4) says that inherited routines are primitive
for derived types. It doesn't say that routines inherited *from the parent
type* are primitive. In this case, Control_Type is derived, so inherited
routines are primitive -- and this routine is certainly inherited.
Of course, that seems to be a nonsense interpretation of the language. I
think that 3.2.3(4) was intended to apply only to routines inherited from
the parent. So the question is whether that can be derived from other
language (in which case Tucker's compiler has a bug), or if there is
actually a language hole.
****************************************************************
From: Adam Beneschan
Sent: Monday, March 1, 2004 7:26 PM
> Humm. This looks messy. Primitive subprograms have to be explicitly declared
> for initial types. But 3.2.3(4) says that inherited routines are primitive
> for derived types. It doesn't say that routines inherited *from the parent
> type* are primitive. In this case, Control_Type is derived, so inherited
> routines are primitive -- and this routine is certainly inherited.
The exact language of 3.2.3(2,4) is:
The primitive subprograms of a specific type are defined as
follows:
For a derived type, the inherited (see 3.4) user-defined
subprograms;
So we refer to 3.4 to see what it says about "inherited user-defined
subprograms". 3.4(17) says, "For each user-defined primitive
subprogram... of the parent type that already exists at the place of
the derived_type_definition, there exists a corresponding _inherited_
primitive subprogram of the derived type with the same defining name".
The primitive subprograms of the parent type that exist at the time
Control_Type is defined are those that exist for Control_Type's parent
type, Ada.Finalization.Controlled, namely Initialize, Finalize,
Adjust.
So to me, those are "the inherited user-defined subprograms" to which
3.2.3(4) refers. I've always interpreted it that way, just from the
language of those two sections, independently of any other language in
the RM or of any conclusion that a different interpretation would be
nonsense.
> Of course, that seems to be a nonsense interpretation of the language. I
> think that 3.2.3(4) was intended to apply only to routines inherited from
> the parent.
I agree. I personally think the intent is already clear from the RM.
****************************************************************
From: Randy Brukardt
Sent: Thursday, April 29, 2004 9:59 PM
I've just posted the updated Container library AI. [This is version /03.] This
was updated to reflect the conclusions of the six hours of discussion (which
was a record for a single AI) at the Phoenix meeting.
I'm happy to say that most of the suggestions made here were implemented in
some way. Indefinite element containers were added, as well as a list
container. Set operations were added to the set package. Iteration was
changed somewhat to be more familiar to Ada programmers. The operations and
their semantics were made more regular.
Comments are welcome. (But please remember that I have to read and file all
of them for the permanent record, so try to take the long-winded discussions
of philosophy to comp.lang.ada. :-)
****************************************************************
From: Pascal Obry
Sent: Friday, April 30, 2004 1:26 AM
That's great news ! Congratulations to all for the hard word on this issue.
****************************************************************
From: Marius Amado Alves
Sent: Friday, April 30, 2004 3:04 PM
> I've just posted the updated Container library AI...
Excelent!
Just a tiny comment at this time: the names Indefinite_Vectors, etc. do not
sound right to me, because the element type is indefinite, not the
containers. Alternatives:
1. Containers.Indefinite_Elements.Vectors
2. Containers.Vectors_Of_Indefinite_Elements
3. Containers_Of_Indefinite_Elements.Vectors
("Indefinite_Elements" is not literally correct either because the type, not
the elements, is indefinite. But it is a common idiom to say "things" in
place of "thing type".)
I think I like 3.
****************************************************************
From: Jean-Pierre Rosen
Sent: Friday, April 30, 2004 8:20 AM
Everybody talks about a real vector, or a complex matrix. Doesn't seem
to hurt the mathematicians...
****************************************************************
From: Marius Amado Alves
Sent: Friday, April 30, 2004 11:21 AM
Vector of real numbers... real vector.
Vector of elements of indefinite type... vectors of indefinite elements...
indefinite vector.
Ok, I think the ears will get accostumed.
****************************************************************
From: Jeffrey Carter
Sent: Friday, April 30, 2004 5:27 PM
> Comments are welcome. (But please remember that I have to read and file all
> of them for the permanent record, so try to take the long-winded discussions
> of philosophy to comp.lang.ada. :-)
Perhaps I'm missing something, but I don't see why the vector component
needs the assertion anymore. If it's not needed, it would be nice to
eliminate it.
****************************************************************
From: Dan Eilers
Sent: Friday, April 30, 2004 6:25 PM
Some typos:
> All containers are non-limited, and hence allow ordinary assignment. In
> the unique case of a vector, there is a separate assignment procedure:
>
> Assert (Target => V1, Source => V2);
^^^^^^
> The reason is that the model for a vector is that it's implemented using
> an unconstrained array. During ordinary assignment, the internal array
> is deallocated (during controlled finalization), and then a new internal
> [array] is allocated (during controlled adjustment) to store a copy of the
^^^^^^^
"is may not"
hat the average bucket
caching *effects
arbitary
conbined
evalution
exmples
Generic_Revserse_Find
heirarchy
Indefinited_Hashed_Maps
insuffiently
machinary
simplied
stratgies
sucessful
****************************************************************
From: Christoph Grein
Sent: Thursday, May 6, 2004 4:07 AM
A few more typos:
specify precisely where this will happen (it will happen no lat{t}er than the
^^^
AARM Note: Replace_Element, Generic_Update, and Generic_Update_by_Index are
[the] only ways that an element can change from empty to non-empty.
^^^^^
Any exceptions raising during element assignment
raised (as everywhere else)
cursor designates with a[n] index value (or a cursor designating an element at
^^^
declared in Containers.Vectors with a[n] ambiguous (but not invalid, see below)
^^^
but it is {is} a *different* element
^^^^
****************************************************************
From: Marius Amado Alves
Sent: Thursday, May 6, 2004 3:07 PM
What happened to the Lower_Bound, Upper_Bound and "insert with hint"
operations for sets? They were very useful. Is there a way to make the same
kind of searches/updates with the new spec?
Furthermore, often the user already has a cursor value for an element that he
knows is a bound for another search he wants to make. It should be possible
to use this information to improve the search.
Previous versions (e.g. 1.1) of the spec had an "insert with hint" operation
providing something similar, albeit more restrictive (the known cursor had to
be adjacent). The current version does not have even this.
/* I found these requirements in real world situations, namely writing a
database system that uses large sets to store some things. */
At least implementation permissions/advice should exist allowing/encouraging
implementations to provide these optimized search/update operations. Namely
via operations with the standard profiles except having additional "hint"
parameters. Better yet make a number of these optimized profiles standard,
permitting the actual optimization to be null. To assure portability.
****************************************************************
From: Matthew Heaney
Sent: Monday, May 10, 2004 6:48 PM
> What happened to the Lower_Bound, Upper_Bound and "insert with hint"
> operations for sets? They were very useful. Is there a way to make the same
> kind of searches/updates with the new spec?
I tried to keep them, but I argued badly and hence lost that vote.
I discussed restoring these operations (Lower_Bound and Upper_Bound)
with Randy, and he said I'd have post a message on ada-comment
justifying why those operations are needed. (You have to do it that way
since the entire ARG voted in the last meeting, and you have give them
the opportunity to reconsider their decision during the next meeting.)
It's good that you're asking about this, since that's evidence that
there is interest in these operations from someone other than me.
It would be helpful if you could post a follow-up message on ada-comment
giving a specific example of why you need LB and UB. The ARG can then
put this on the agenda for the ARG meeting in Palma.
> Furthermore, often the user already has a cursor value for an element that he
> knows is a bound for another search he wants to make. It should be possible
> to use this information to improve the search.
That's similar to insert-with-hint. However, the ARG members weren't
persuaded by my defense of optimized insertion.
> Previous versions (e.g. 1.1) of the spec had an "insert with hint" operation
> providing something similar, albeit more restrictive (the known cursor had to
> be adjacent). The current version does not have even this.
Yes, that is correct. Personally I can live without the
insert-with-hint operations (because you have insert-sans-hint), but I
think removing the Lower_Bound and Upper_Bound operations was a mistake,
since that leaves no way to find the set element nearest some key. All
you have now is basically a membership test, which is too coarse a
granularity.
For example, someone on CLA had an ordered set of integers, and he
wanted to iterate over the values in [0, 1000), then from [1000, 2000),
etc. Without Lower_Bound there's no way to do that.
> /* I found these requirements in real world situations, namely writing a
> database system that uses large sets to store some things. */
Please post an example of what you're trying to do, and show how it
can't be done without Lower_Bound and Upper_Bound.
> At least implementation permissions/advice should exist allowing/encouraging
> implementations to provide these optimized search/update operations. Namely
> via operations with the standard profiles except having additional "hint"
> parameters. Better yet make a number of these optimized profiles standard,
> permitting the actual optimization to be null. To assure portability.
Give an example of why you need Lower_Bound and Upper_Bound, and request
that the ARG put it on the agenda for Palma.
Some other possibilities are:
procedure Find
(Container : in Set;
Key : in Key_Type;
Position : out Cursor;
Success : out Boolean);
If the Key matches, then Success=True and Position.Key = Key.
Otherwise, Success=False and Key < Position.Key.
Technically you don't need that, since you can test the result of
Lower_Bound:
C : Cursor := Lower_Bound (Set, Key);
begin
if Key < Position then
null; --Position denotes next (successor) neighbor
else
null; --Position denotes node containing Key
end;
Another possibility is to name it something like "Ceiling" or whatever.
An additional possibility is something (STL) like:
procedure Equal_Range
(Container : in Set;
Key : in Key;
Lower_Bound : out Cursor;
Upper_Bound : out Cursor);
Then you can test:
Lower_Bound = Upper_Bound => key not found
Lower_Bound /= Upper_Bound => found
This latter operation has the benefit of working with multisets too.
****************************************************************
From: Marius Amado Alves
Sent: Tuesday, May 11, 2004 6:55 AM
Upon Heaney's advice, I'll detail the case for optimized operations for
sets.
I use Lower_Bound in the implementation of Mneson, in at least four
subprograms, excerpted below. For the entire code see
www.liacc.up.pt/~maa/mneson. Mneson is a database system based on a directed
graph implemented as a set of links.
Link_Sets is an instantiation of AI302.Containers.Ordered_Sets for
Link_Type, which is an array (1 .. 2) of vertices. Link_Set and Inv_Link_Set
are Link_Sets.Set_Type objects. Links are ordered by the 1st component, then
by the 2nd. Front_Vertex is an unlinked vertex value lower that any other.
procedure Delete_All_In_Range
(Link_Set : in out Link_Sets.Set_Type; From, To : Link_Type)
is
use Link_Sets;
I : Cursor_Type := Lower_Bound (Link_Set, From);
begin
while I /= Back (Link_Set) loop
exit when Element (I) > To;
Delete (Link_Set, I);
end loop;
end;
procedure For_Each_Link_In_Range
(Set : Link_Sets.Set_Type; From, To : Link_Type)
is
use Link_Sets;
I : Cursor_Type := Lower_Bound (Set, From);
E : Link_Type;
begin
I := Lower_Bound (Set, From);
while I /= Back (Set) loop
E := Element (I);
exit when E > To;
Process (E);
Increment (I);
end loop;
end;
function Connected (Source : Vertex) return Boolean is
use Link_Sets;
begin
return
Lower_Bound (Links, (Source, Front_Vertex)) /= Null_Cursor;
end;
function Inv_Connected (Target : Vertex) return Boolean is
use Link_Sets;
begin
return
Lower_Bound (Inv_Links, (Target, Front_Vertex)) /= Null_Cursor;
end;
I'm also developing optimized algorithms for set intersection that require
not only Lower_Bound but also search with hint (known bounds), and
eventually Upper_Bound. These are still on the drawing board, but I already
know at this point that they require those operations. Soon I'll have some
code, but it's rather complicated, because Mneson sets can be of various
kinds, extensional and intensional, the basic extensional being a designated
vertex whose targets are the elements, the intensional being a dedicated
"selection" structure, designed for lazy evaluation, with elements being
represented in several ways, and materialized only upon certain operations
like iteration and extraction.
My interest is databases. At least here, ordered sets are an incredibly
useful thing. Pretty much every interesting database function can be defined
in terms of them. In a graph-based implementation like Mneson, set
intersection is crucial.
The spec now has the full set algebra (union, intersection, differences,
etc.) That is good, and if their performance were ideal for all purposes,
I'd be silent.
But I know their performance cannot be ideal in many situations, because I
know optimization techniques that require more than what the spec now
offers. Namely they require search with hint and/or Lower_Bound.
And anyway the spec does not specify performance for them (only for Insert,
Find, Element).
Also note that the Find operations for Vectors and Hashed_Maps are kind of
hintful, so it's only fair that Ordered_Sets have these versions too.
For databases, performance is paramount. Even apparently small gains matter.
"Apparently" because many database functions scale worse than lineary, e.g.
cross products. Optimization is these cases is a must. In many cases the
optimization makes all the difference (between feasible and unfeasible).
Optimization is invariably based on knowledge the system prepares about the
sets in the expression queried for computation. The preparation time is
usually negligible. In a system implemented with Ada.Containers, great part
of the prepared knowledge is ultimately expressed as cursor values for known
element value bounds for the sought element ranges.
Ordered_Sets implementations are likely to be able to take advantage of this
knowledge for improving time performance (the previous AI302 "insert with
hint" is an example).
Therefore it is required that this knowledge can be passed to the basic
operations.
Immodestely assuming I've made a convincing case, I can inform that Heaney
and myself have solid ideas on how the operations should look like and we
are ready to prepare a pretty open-shut proposal for Palma. I myself will be
there from Sunday to Sunday, and happily available for discussion. To me the
most promising format is *prescribing* hintful versions of Find et al. but
with only *advised* performance, i.e. allowing null optimization.
****************************************************************
From: Matthew Heaney
Sent: Tuesday, May 11, 2004 12:37 PM
> Link_Sets is an instantiation of AI302.Containers.Ordered_Sets for
> Link_Type, which is an array (1 .. 2) of vertices. Link_Set and Inv_Link_Set
> are Link_Sets.Set_Type objects. Links are ordered by the 1st component, then
> by the 2nd. Front_Vertex is an unlinked vertex value lower that any other.
>
> procedure Delete_All_In_Range
> (Link_Set : in out Link_Sets.Set_Type; From, To : Link_Type)
> is
> use Link_Sets;
> I : Cursor_Type := Lower_Bound (Link_Set, From);
> begin
> while I /= Back (Link_Set) loop
> exit when Element (I) > To;
> Delete (Link_Set, I);
> end loop;
> end;
You might want to vet From and To, to assert that they're in order. It
also looks like you mean to delete the node designated by To (this is
apparently a closed range), which means you could use Upper_Bound to
find the endpoint of the range:
procedure Delete_All_In_Range
(Link_Set : in out Link_Sets.Set_Type; From, To : Link_Type)
is
pragma Assert (From <= To);
use Link_Sets;
I : Cursor_Type := Lower_Bound (Link_Set, From);
J : constant Cursor_Type := Upper_Bound (Link_Set, To);
begin
while I /= J loop
Delete (Link_Set, I);
end loop;
end;
> procedure For_Each_Link_In_Range
> (Set : Link_Sets.Set_Type; From, To : Link_Type)
> is
> use Link_Sets;
> I : Cursor_Type := Lower_Bound (Set, From);
> E : Link_Type;
> begin
> I := Lower_Bound (Set, From); --???
> while I /= Back (Set) loop
> E := Element (I);
> exit when E > To;
> Process (E);
> Increment (I);
> end loop;
> end;
This again appears to be a closed range, so I recommend using
Upper_Bound to find the endpoint:
procedure For_Each_Link_In_Range
(Set : Link_Sets.Set_Type; From, To : Link_Type)
is
pragma Assert (From <= To);
use Link_Sets;
I : Cursor_Type := Lower_Bound (Set, From);
J : constant Cursor_Type := Upper_Bound (Set, To);
begin
while I /= J loop
Process (Element (I));
Increment (I);
end loop;
end;
Alternatively, you could use the new Generic_Update procedure:
procedure For_Each_Link_In_Range
(Set : Link_Sets.Set_Type; From, To : Link_Type)
is
pragma Assert (From <= To);
use Link_Sets;
procedure Process (E : in out Link_Type) is
begin
...; --whatever
end;
procedure Update is new Generic_Update;
I : Cursor_Type := Lower_Bound (Set, From);
J : constant Cursor_Type := Upper_Bound (Set, To);
begin
while I /= J loop
Update (I);
Increment (I);
end loop;
end;
(Note that I only have the vectors done in the reference implementation.)
> function Connected (Source : Vertex) return Boolean is
> use Link_Sets;
> begin
> return
> Lower_Bound (Links, (Source, Front_Vertex)) /= Null_Cursor;
> end;
Lower_Bound will only return Null_Cursor if the value is greater than
every element in the set. So it looks like you're testing whether the
value is less than or equal to an element in the set. There are
probably other ways to implement this predicate function, for example:
function Connected (Source : Vertex) return Boolean is
use Link_Sets;
begin
if Is_Empty (Source) then
return False;
end if;
return Link_Type'(Source, Front_Vector) <= Last_Element (Links);
end;
> function Inv_Connected (Target : Vertex) return Boolean is
> use Link_Sets;
> begin
> return
> Lower_Bound (Inv_Links, (Target, Front_Vertex)) /= Null_Cursor;
> end;
Ditto for this function.
The moral here is you don't need Lower_Bound if all you do is throw away
its result.
However, it looks like in the first two examples, you have a legitimate
need for Lower_Bound (and arguably Upper_Bound, too).
****************************************************************
From: Marius Amado Alves
Sent: Tuesday, May 11, 2004 1:20 PM
> ...
> > function Connected (Source : Vertex) return Boolean is
> > use Link_Sets;
> > begin
> > return
> > Lower_Bound (Links, (Source, Front_Vertex)) /= Null_Cursor;
> > end;
>
>
> Lower_Bound will only return Null_Cursor if the value is greater than
> every element in the set.
Oops, this was a bug. Thanks a lot for catching it. What I must have meant
is:
X := Lower_Bound (Links, (Source, Front_Vertex));
return X /= Null_Cursor and then Element (X) (1) = Source;
Thanks a lot for the other suggestions too. I won't be applying them yet
because if-it-works-dont-fix-it, but I've certainly queued them in the
Mneson "to do" list.
> ...it looks like in the first two examples, you have a legitimate
> need for Lower_Bound (and arguably Upper_Bound, too).
Yes. And these, unlike the specific version of Connect above, are used and
tested.
(It seems the specific version of Connect above had not been used yet. Which
accounts for it's fault not being detected. It's there in the library
because when I wrote the libary it looked like it would be necessary. Thanks
to you, if and when it does, it will be flawless. Thanks again.)
****************************************************************
From: Tucker Taft
Sent: Tuesday, May 11, 2004 2:17 PM
I think I missed the beginning of this discussion,
but I would agree with the suggestion for using
Floor and Ceiling rather than Lower_Bound and Upper_Bound,
to find the nearest element of the set no greater
(or no less, respectively) than a given value.
And I agree they would be useful operations on an ordered set.
Lower_Bound and Upper_Bound seem more likely to refer to the
minimum and maximum elements of the entire set.
****************************************************************
From: Marius Amado Alves
Sent: Tuesday, May 11, 2004 3:31 PM
> ... I would agree with the suggestion for using
> Floor and Ceiling...
Good. One of the proposals I'm discussing with Matt has indeed
function Floor (Item : Element_Type) return Cursor;
function Ceiling (Item : Element_Type) return Cursor;
where Ceiling = Lower_Bound, but Floor /= Upper_Bound, Floor =
Reverse_Lower_Bound.
(Here Lower_Bound and Upper_Bound are the functions defined in version 1.1
of the spec, and that were dropped in the current. Reverse_Lower_Bound is a
fictitious function like Lower_Bound but in reverse order.)
The proposal also has
function Slice (Container : Set; Low, High : Cursor) return Set;
function Open_Bound (Position : Cursor) return Cursor;
The four functions provide a complete search optimization framework. The
main idea is that a slice can be used to convey range and/or optimization
information to any operation.
Slice returns the subset of Set consisting of the elements of Set that are
in the specified interval.
Open_Bound returns a cursor marked as an open bound when used in Slice. A
unmarked cursor represents a closed bound.
The integer set example, namely to iterate over the values in [0, 1000),
then [1000, 2000), etc., becomes:
procedure Iterate is new Generic_Iteration;
begin
Iterate (Slice (Integer_Set, Floor (0), Open_Bound (Ceiling (1000))));
Iterate (Slice (Integer_Set, Floor (1000), Open_Bound (Ceiling (2000))));
Also, with this framework, Upper_Bound (Set, Item) can be realised
functionally as:
First
(Slice
(Set,
Open_Bound (Ceiling (Set, Item)),
Last (Set)))
So no need for Upper_Bound.
The only sensitive aspect of this framework is the use of a slice as an
object of update operations (Insert, etc.) A slice is likely to be best
represented as a 'virtual' set, i.e. only a 'view' to the corresponding
subset of its 'ground' container. We are currently checking whether and how
and which update operations can process this virtual object properly.
****************************************************************
From: Matthew Heaney
Sent: Tuesday, May 11, 2004 5:13 PM
> Lower_Bound and Upper_Bound seem more likely to refer to the
> minimum and maximum elements of the entire set.
As Mario has pointed out, Ceiling is equivalent to Lower_Bound.
There is no function that corresponds to a Floor function in the STL,
Charles, or earlier releases of the AI-302 draft. I did discuss how to
implement a floor function in the examples section of earlier drafts, as
follows:
Floor (S, K) = Previous (Upper_Bound (S, K))
Here Floor is derived from Upper_Bound. In the most recent draft, for
subtle reasons you have to implement Floor as:
function Floor (S, K) return Cursor is
C : Cursor := Upper_Bound (S, K);
begin
if C = No_Element then
return Last (S);
else
return Previous (C);
end if;
end;
To derive Upper_Bound from Floor, I think it would be:
function Upper_Bound (S, K) return Cursor is
C : Cursor := Floor (S, K);
begin
if C = No_Element then
return First (C);
else
return Next (C);
end if;
end;
To iterate over the half-open range [K1, K2), where K1 <= K2, I think
you would have to write:
declare
I : Cursor := Ceiling (S, K1);
J : Cursor := Floor (S, K2);
begin
if J = No_Element then
J := First (S);
end if;
while I /= J loop
...
Next (I);
end loop;
end;
However, this seems a little awkward. (Assuming my analysis is correct.
I have to think about whether [K1, K2) is a closed range or a
half-open range. Mario's example was a closed range.)
What we really need is something to compliment Ceiling, something like
"Strict_Ceiling" or "Proper_Ceiling", e.g.
declare
I : Cursor := Ceiling (S, K1); -- K1 <= I.Key
J : Cursor := Proper_Ceiling (S, K2); -- K2 < J.Key
begin
while I /= J loop ...;
end;
Is there a technical term for "proper ceiling"? I want a function that,
given a key, returns the smallest key greater than the key. (That's
what function Upper_Bound returns, but that name seems to be confusing
to people unfamiliar with the STL.)
****************************************************************
From: Marius Amado Alves
Sent: Tuesday, May 11, 2004 5:19 PM
Two corrections:
Slice returns the subset of *Container* consisting of the elements of
*Container* that are in the specified interval (not Set, that's the type).
Upper_Bound (S, Item) =
First
(Slice
(S,
Open_Bound (*Floor* (S, Item)),
Last (S)))
(S instead of Set because that's the type name, and Floor, not Ceiling)
Sorry.
BTW, Reverse_Upper_Bound (S, Item) =
Last
(Slice
(S,
First (S),
Open_Bound (Ceiling (S, Item)))).
Also,
S = Slice (S, First (S), Last (S))
should always hold.
Currently thinking about the special cases, namely those with occurrences of
No_Element.
And about the slice-for-update problem: easily solved with a specification
similar to the current one for invalid cursors, given that a slice is
expressed as cursor values.
****************************************************************
From: Marius Amado Alves
Sent: Wednesday, May 12, 2004 3:15 AM
> Here Floor is derived from Upper_Bound. In the most recent draft, for
> subtle reasons you have to implement Floor as:
>
> function Floor (S, K) return Cursor is
> C : Cursor := Upper_Bound (S, K);
> begin...
But you don't have Upper_Bound in the most recent draft!
> What we really need is something to compliment Ceiling, something like
> "Strict_Ceiling" or "Proper_Ceiling", e.g.
I'm against strange things in the spec. Give the user only well known
concepts. A complete set of primitive well known concepts. Ceiling, Floor,
Slice, Open_Bound. Then he can derive whatever Strange_Ceiling he wants.
> Is there a technical term for "proper ceiling"?
Smallest_Greater_Than :-) But then to be complete you need also
Greatest_Smaller_Than. But then again, don't give strange things.
****************************************************************
From: Marius Amado Alves
Sent: Wednesday, May 12, 2004 3:44 AM
> procedure Delete_All_In_Range
> (Link_Set : in out Link_Sets.Set_Type; From, To : Link_Type)
> is
> pragma Assert (From <= To);
>
> use Link_Sets;
> I : Cursor_Type := Lower_Bound (Link_Set, From);
> J : constant Cursor_Type := Upper_Bound (Link_Set, To);
> begin
> while I /= J loop
> Delete (Link_Set, I);
> end loop;
> end;
My impression is that the original version is more efficient, because it
only calls a search function once (Lower_Bound). Your version makes two
calls (Lower_Bound, Upper_Bound). I assume these operations have O(log n)
time performance, and the others (Back, Element, Delete) constant time. But
my version calls these more times. So I'd have to check the absolute times.
This also provides an example for optimized search with Slice. Because I
know the upper bound must be above the lower, I could pass this information
to Upper_Bound:
J : constant Cursor_Type := Upper_Bound (Slice (Link_Set, I, Last
(Link_Set)), To);
****************************************************************
From: Matthew Heaney
Sent: Wednesday, May 12, 2004 9:33 AM
>>What we really need is something to compliment Ceiling, something like
>>"Strict_Ceiling" or "Proper_Ceiling", e.g.
>
> I'm against strange things in the spec. Give the user only well known
> concepts. A complete set of primitive well known concepts. Ceiling, Floor,
> Slice, Open_Bound. Then he can derive whatever Strange_Ceiling he wants.
Does Upper_Bound qualify as a "well known concept"? All I'm trying to
do is come up with another name for Upper_Bound.
>>Is there a technical term for "proper ceiling"?
>
> Smallest_Greater_Than :-) But then to be complete you need also
> Greatest_Smaller_Than. But then again, don't give strange things.
But Upper_Bound isn't a strange thing.
I suspect Stepanov was motivated by the set-theoretic terms "upper
bound," "least upper bound", etc. But I think it's that conflation to
which Tucker objects.
Other names for Upper_Bound are: limit, supremum, supremum limit, etc.
procedure Op (K1, K2 : Key_Type) is
I : Cursor := Ceiling (Set, K1);
J : constant Cursor := Limit (Set, K2);
begin
while I /= J loop ...;
end;
****************************************************************
From: Matthew Heaney
Sent: Wednesday, May 12, 2004 10:16 AM
> Lower_Bound and Upper_Bound seem more likely to refer to the
> minimum and maximum elements of the entire set.
One counter-argument is that both Lower_Bound and Upper_Bound accept a key.
Maybe we could provide these:
Lower_Limit
Floor
Ceiling (AKA Lower_Bound)
Upper_Limit (AKA Upper_Bound)
with the following semantics:
Key (Lower_Limit (S, K)) < K
Key (Floor (S, K)) <= K
Key (Ceiling (S, K)) >= K
Key (Upper_Limit (S, K)) > K
****************************************************************
From: Tucker Taft
Sent: Wednesday, May 12, 2004 11:41 AM
I don't find the names Lower_Limit and Upper_Limit
a whole lot better than Lower_Bound/Upper_Bound.
I don't see why you need them. It seems
Lower_Limit(S,K) = Previous(Ceiling(S,K)) and
Upper_Limit(S,K) = Next(Floor(S,K))
Or am I confused?
****************************************************************
From: Matthew Heaney
Sent: Wednesday, May 12, 2004 1:46 PM
No, you got it right, except for the endpoints; see my last message.
For example, if Ceiling(S,K) returns No_Element (because K is large),
then Previous(Ceiling(S,K)) returns No_Element, whereas Lower_Limit
returns Last(S).
We can define the abstraction to have the semantics you describe above,
but I think that requires that (1) the set has an internal sentinel and
(2) type Cursor is privately tagged.
****************************************************************
From: Marius Amado Alves
Sent: Wednesday, May 12, 2004 10:36 AM
> Lower_Limit
> Floor
> Ceiling (AKA Lower_Bound)
> Upper_Limit (AKA Upper_Bound)
Better to keep a consistent metaphor: Ground, Floor, Ceiling, Roof. "Limit" is
too abstract.
Alternatives for Ground: Basement, Base... Underworld :-)
****************************************************************
From: Marius Amado Alves
Sent: Wednesday, May 12, 2004 11:47 AM
> Lower_Limit
> Floor
> Ceiling (AKA Lower_Bound)
> Upper_Limit (AKA Upper_Bound)
In mathematics "lower limit" applies to a sequence of values (e.g. the values
of sin (x) with x from zero to infinity), and means the least value of the
sequence. So it's really more similar to First.
[My main source for checking this stuff has been the Wikipedia
(en.wikipedia.org)]
Have you considered my Slice, Open_Bound proposal yet?
Recapitulating:
Ground, Floor, Ceiling, Roof, do not solve the problem of providing search
optimization information to the other operations.
Slice, Open_Bound do.
And Ground, Roof can be derived from Ceiling, Floor, Slice, Open_Bound, First,
Last.
So my proposal is adding Ceiling, Floor, Slice, Open_Bound.
And eventually Ground, Roof defined as "equivalent to..."
****************************************************************
From: Marius Amado Alves
Sent: Wednesday, May 12, 2004 8:22 AM
Connected *is* a legitimate example of the need for Lower_Bound.
The fixed Connected body is
X : Cursor_Type := Lower_Bound (Links, (Source, Front_Vertex));
begin
return X /= Null_Cursor and then Element (X) (1) = Source;
Matt, your suggestion,
> function Connected (Source : Vertex) return Boolean is
> use Link_Sets;
> begin
> if Is_Empty (Source) then
> return False;
> end if;
>
> return Link_Type'(Source, Front_Vector) <= Last_Element (Links);
> end;
won't work. Apart from the obvious bugs Is_Empty (Source) which should be
Is_Empty (Links), and Front_Vector which should be Front_Vertex, the return
expression
Link_Type'(Source, Front_Vertex) <= Last_Element (Links)
does not yield as desired. Front_Vertex is a value that is never connected
and is lower than any other. Let's say Front_Vertex = 0 and Links = ((2, 3),
(2, 4)). Then Connected (1) would (erroneously) return True, because (1, 0)
<= (2, 4). You're not checking for actual membership in Links. Maybe you had
something else in mind.
****************************************************************
From: Marius Amado Alves
Sent: Wednesday, May 12, 2004 10:30 AM
<<Does[n't] Upper_Bound qualify as a "well known concept"? >>
Not terribly, no.
<<All I'm trying to do is come up with another name for Upper_Bound....
I suspect Stepanov was motivated by the set-theoretic terms "upper
bound," "least upper bound", etc. ...
Other names for Upper_Bound are: limit, supremum, supremum limit, etc.>>
Actually the term "least upper bound" in mathematics is what we have been
calling *Lower_Bound*, or Ceiling. And "greatest lower bound" is Floor. I don't
know a mathematical term for what we have been calling Upper_Bound. Which to me
indicates a bit of strangeness.
Also, Upper_Bound (let's keep calling it that):
- does not seem to be so useful as Ceiling, if at all
- can be derived with First, Last, and Slice
My previous examples demonstrate this.
But the term you're looking for might be: Roof.
****************************************************************
From: Tucker Taft
Sent: Wednesday, May 12, 2004 12:40 PM
> Have you considered my Slice, Open_Bound proposal yet?
I don't see the need for "Slice" or "Open_Bound".
These seem to be introducing a layer of "virtual"
set on top, which you could do with a new abstraction.
Is there a real efficiency need here, or just a desire
for the additional abstraction level?
For example, it seems using an Open_Bound as the high
bound of an iteration is equivalent to iterating up to
Previous(Ceiling()). You can easily create a "real"
slice by iterating from the low bound to the high
bound and insert the result in a new set. If you want
a "virtual" slice, then to me that is an additional
layer on top, and not something appropriate for the
basic Ordered_Sets abstraction.
...
> So my proposal is adding Ceiling, Floor, Slice, Open_Bound.
>
> And eventually Ground, Roof defined as "equivalent to..."
I don't see the need to go beyond Floor and Ceiling. They
seem to provide all the primitives needed to enable the
efficient construction of any operations you might want,
and I believe their meaning is more intuitive than the others
you have suggested.
****************************************************************
From: Matthew Heaney
Sent: Wednesday, May 12, 2004 1:25 PM
> For example, it seems using an Open_Bound as the high
> bound of an iteration is equivalent to iterating up to
> Previous(Ceiling()).
This requires care, since Ceiling can return No_Element if the key is
greater than every key in the set. To make your algorithm fully general
I think you'd have to say:
declare
C : Cursor := Ceiling (S, K);
begin
if Has_Element then
Previous (C);
else
C := Last (S);
end if;
...
end;
> I don't see the need to go beyond Floor and Ceiling. They
> seem to provide all the primitives needed to enable the
> efficient construction of any operations you might want,
> and I believe their meaning is more intuitive than the others
> you have suggested.
As above, the problem case is when Floor returns No_Element, because the
key is less than every key in the set. To implement an equivalent of
Upper_Bound, it's not good enough to say Next (Floor (S, K)); you have
to say instead:
declare
C : Cursor := Floor (S, K);
begin
if Has_Element (C) then
Next (C);
else
C := First (S);
end if;
...
end;
I don't know whether this is really a problem, but I just wanted to
bring it up. Having to handle the endpoints as a special case is a
consequence of the fact that we got rid of the internal sentinel node.
Another possibility is to restore the sentinel, and then define rules
for how it compares to the deferred constant No_Element. Assuming type
Cursor is defined as:
type Node_Type is record -- red-black tree node
Color : Color_Type;
...
end record;
type Cursor is record
Node : Node_Access;
end record;
No_Element : constant Cursor := (Node => null);
function Has_Element (C : Cursor) return Boolean is
begin
if C.Node = null then
return False;
end if;
if C.Node.Color = White then -- sentinel has special color
return False;
end if;
return True;
end;
function "=" (L, R : Cursor) return Boolean is
begin
if L.Node = null
or else L.Node.Color = White
then
return R.Node = null or else R.Node.Color = White;
end if;
if R.Node = null
or else R.Node.Color = White
then
return False;
end if;
return L.Node = R.Node;
end;
The problem of course is that "=" for type Cursor overrides predefined
"=", which means predefined "=" re-emerges when type Cursor is a record
or array component, or when type Cursor is a generic actual type.
I suppose we could privately tag type Cursor, to guarantee that
predefined "=" never re-emerges. I was trying to avoid that, however.
****************************************************************
From: Marius Amado Alves
Sent: Wednesday, May 12, 2004 1:47 PM
"Lower_Limit(S,K) = Previous(Ceiling(S,K))"
You mean
Lower_Limit (S, K) = Previous (Floor (S, K)).
But this fails when Floor (S, K) < K.
****************************************************************
From: Matthew Heaney
Sent: Wednesday, May 12, 2004 1:56 PM
No. Tucker was correct.
****************************************************************
From: Tucker Taft
Sent: Wednesday, May 12, 2004 2:21 PM
No, I meant what I wrote, based on Matt's specification
that Key(Lower_Limit(S,K)) < K. I'm not sure you
and Matt have the same definition in mind for all
these functions. In particularly I get the sense
that your definition of Lower_Bound is the opposite
of his. I understand the notion of Greatest_Lower_Bound
on a lattice, but I have never quite understand how
that relates to Lower_Bound.
In any case, I was focusing on the specifications
that Matt gave for Lower_Limit and Upper_Limit,
and based my equivalence on those.
And I realize my equivalence fails at the end points,
but I suspect that some special handling may be required
for those in any case, and it is easy enough for the
user to define a function that does what is desired
(e.g. Previous_Or_Last() which returns Last when given
No_Element).
> But this fails when Floor (S, K) < K.
That's why I wrote Previous(Ceiling(S,K)).
****************************************************************
From: Marius Amado Alves
Sent: Thursday, May 13, 2004 4:30 PM
Oops, sorry. *I* was confused.
By the way, I checked the names so far, and they are (aligned, but in no
specific order):
Version 1.1 Mathematics My names Matthew's AKAs, other
----------------------------------------------------------------------------
Lower_Bound least upper bound Ceiling Ceiling
Upper_Bound Roof Upper_Limit
greatest lower bound Floor Floor
Reverse_Lower_Bound
Ground Lower_Limit
Reverse_Upper_Bound
lower limit First
upper limit Last
----------------------------------------------------------------------------
****************************************************************
From: Matthew Heaney
Sent: Wednesday, May 12, 2004 2:18 PM
> I don't see why you need them. It seems
> Lower_Limit(S,K) = Previous(Ceiling(S,K)) and
> Upper_Limit(S,K) = Next(Floor(S,K))
Thinking about this issue some more, there might be a way to create
these semantics without a sentinel. If a cursor is implemented this way:
type Cursor is record
Container : Set_Access;
Node : Node_Access;
end record;
In which case you could implement Previous as:
function Previous (C : Cursor) return Cursor is
begin
if C.Container = null then --No_Element
return C;
end if;
if C.Node = null then --pseudo-sentinel
return C; --or: Last (C.Container)
end if;
if C = First (C.Container) then
return (C.Container, null); --pseudo-sentinel
end if;
return Previous (C.Container.Tree, C.Node);
end;
Next would be implemented similarly.
The only issue here is that Previous (First (S)) /= No_Element (the LHS
has a non-null set pointer, the RHS has a null set pointer). I don't
know if this is an issue.
****************************************************************
From: Tucker Taft
Sent: Wednesday, May 12, 2004 2:30 PM
I don't think we need to change
"Previous" to make these equivalences work for
endpoints. Just let the user write a
"Previous_Or_Last" if they really want to,
which would need to take both a cursor and a set.
Or more directly, write Lower_Limit or Upper_Limit
if you want them, since these already have enough
information with the set and the key.
Providing Ceiling and Floor still seems adequate to me,
as they provide the needed primitives for all other
operations mentioned thus far.
****************************************************************
From: Matthew Heaney
Sent: Wednesday, May 12, 2004 2:36 PM
OK. That seems reasonable. I just wanted to make sure we were on the
same page w.r.t the behavior at the endpoints.
****************************************************************
From: Marius Amado Alves
Sent: Wednesday, May 12, 2004 1:28 PM
<<I don't see the need for "Slice" or "Open_Bound".
These seem to be introducing a layer of "virtual"
set on top, which you could do with a new abstraction.
Is there a real efficiency need here, or just a desire
for the additional abstraction level?>>
Efficiency. Slice is a simple way of passing known bounds to *any* operation.
As an example consider the usual scenario from accounting where you have
invoices, and each invoice has a variable number of items. The relational
representation of this database includes a set Items of (Invoice_Id, Item_Id)
pairs, ordered by (Invoice_Id, Item_Id). You want to insert a new invoice X
with items A and B. Without Slice you do:
Insert (Items, (X, A), Point_XA, Ok);
Insert (Items, (X, B), Point_XB, Ok);
Each time Insert will have to search for the insertion point from the start
(e.g. from the root of a binary tree). But clearly Point_XA is close to
Point_XB, so if there was a way of telling Insert that we are inserting (X, B)
next to Point_XA, Insert could start looking from there to great advantage.
Slice provides that way.
Insert
(Slice (Items, Point_XA, Last (Items)),
(X, B), Ok);
You could even save some extra micro-seconds writing:
Insert
(Slice (Items, Open_Bound (Point_XA), Last (Items)),
(X, B), Ok);
[Of course there are other ways, not relational, of representing the data. For
example, Items could be a set of pairs (Invoice_Id, Item_Set), where Item_Set
is a set of items. But there are a number of reasons why you might want the
relational scheme. One is that with this scheme you can search Items by
properties of Item_Id. For example you might want to know which invoices sold
part number 12345. One more subtle reason--not applicable to this example, but
occuring in other common situations--has to do with the unfortunate fact that
it is not possible to have recursive containers without resorting to a pointer
idiom. There are other reasons.]
****************************************************************
From: Tucker Taft
Sent: Wednesday, May 12, 2004 3:07 AM
> Efficiency. Slice is a simple way of passing known
> bounds to *any* operation....
If I understand you, Slice is not a copy, but a by-reference
subset of a set, created for the purpose of improving performance.
I don't find this example sufficiently compelling to include it
in a basic capability like Ordered_Sets. It requires significant
set up by the user, and it seems possible that in some implementations,
it would be a waste of energy.
I like "Ceiling" and "Floor" because they address the common
notion of "nearest" element or approximate match, something
which makes sense to ask in a set. Slice and Open_Bound
seem to only serve some more obscure performance concern,
which I don't see of being of wide or general usefulness.
All these things involve subtle tradeoffs, and I accept you
might make different choices, but we are looking to provide
the 20% of all possible set operations that together meet
the needs of 80% of the typical users of sets.
****************************************************************
From: Marius Amado Alves
Sent: Thursday, May 13, 2004 6:34 AM
> > Efficiency. Slice is a simple way of passing known
> > bounds to *any* operation....
>
> If I understand you, Slice is not a copy, but a by-reference
> subset of a set, created for the purpose of improving performance.
Exactly. It *must not* be a copy.
> I don't find this example sufficiently compelling to include it
> in a basic capability like Ordered_Sets. It requires significant
> set up by the user, and it seems possible that in some implementations,
> it would be a waste of energy.
The setup is not significant because the user can always ignore the slice
idiom, and/or only use it when the known bounds have been acquired naturally
from previous operations required by the application logic, as was the case
in the invoices example.
The implementation is easy, especially if null optimization is allowed, as I
proposed (a slice obviously knows about its base container, so an
unoptimized operation can just call itself with the base). But in most
implementations, namely using trees or skip lists, the implementation of
non-null optimization is also easy, because usually the internal search
primitives are recursive operations accepting bounds expressed as a node or
nodes, and the Cursor type is likely to have node information, as in the
previous Matt's study. So not a waste of energy. The implementation is
already there.
And here's one more real life example. Website access analysis. You want to
identify sessions from a HTTP requests log file. A session is a sequence of
requests from the same IP such that the time between each consecutive
request does not exceed 30 minutes (this is a common criteria). You want to
update each request with the corresponding (computed) session id. You key
the access log by (IP, Time), and traverse the entire file to effect this
logic. You will have naturally collected bounds for the fine search and
update operations. Rather strict bounds, giving you (in an non-null
optimization implementation) orders of magnitude gains in time. The usual
application of website analysis is for huge log files, of tens of million
accesses. The gains could mean a difference from hours to minutes or
seconds. I've done this stuff using databases systems (Postgres, MySQL), and
the scripts ran four hours. I wasn't able to optimized more because of the
same reasons we're discussing here: lack of ways to pass known bounds to the
core data engine. I've done this kind of work in several real life
application, including
http://soleunet.ijs.si/website/other/final_report/html/WP5-s9.html
Note that with the increasing availability of large RAM, the tendency is
towards *prevalent* systems, where all data required for search and
retrieval is held in RAM during work. An optimized Ada.Containers library
could mean a great plus for Ada in this area. Databases have been identified
as a promising area for Ada. Perhaps the DADAISM project had not stalled if
there were optimized Ada.Containers around then.
Open_Bound is not strictly required for optimization, but together with
Slice it provides a means to express any kind of interval.
I understand the 20/80 rule. It's just that in my perception the addition of
Slice configures 20.5/90 or so. Say 21/95 further adding Open_Bound.
****************************************************************
From: Marius Amado Alves
Sent: Thursday, May 13, 2004 7:36 AM
Note that Slice is also useful for non-optimization purposes. For example,
currently to process a "range" you must use the "active" iterator idiom:
I : Cursor := From;
begin
while I <= To loop
Process (I);
Next (I);
end loop;
With Slice you have access to the "passive" idiom right out of the box:
procedure Iterate is new Generic_Iteration;
begin
Iterate (Slice (S, From, To));
****************************************************************
From: Marius Amado Alves
Sent: Thursday, May 13, 2004 8:08 AM
> ...you could use the new Generic_Update procedure:
>
> procedure For_Each_Link_In_Range
> (Set : Link_Sets.Set_Type; From, To : Link_Type)
> is
> pragma Assert (From <= To);
>
> use Link_Sets;
>
> procedure Process (E : in out Link_Type) is
> begin
> ...; --whatever
> end;
>
> procedure Update is new Generic_Update;
>
> I : Cursor_Type := Lower_Bound (Set, From);
> J : constant Cursor_Type := Upper_Bound (Set, To);
> begin
> while I /= J loop
> Update (I);
> Increment (I);
> end loop;
> end;
Generic_Update is excellent stuff. It does not apply in this particular case
though, because Mneson links are immutable by design (can only be created or
deleted, never changed). But there are a lot of element update situations in
other applications, and so having Generic_Update is a great improvement from
version 1.1 of the spec and corresponding reference implementation (that is
the one currently used by Mneson, and that does not have
Generic_Update--good thing that Mneson does not need them :-).
****************************************************************
From: Matthew Heaney
Sent: Thursday, May 13, 2004 10:24 AM
This will have to written as
I : Cursor := Ceiling (Set, From);
J : Cursor := Floor (Set, To);
begin
if J = No_Element then --To is small key
pragma Assert (I = First (Set));
return;
end if;
Next (J); --now has value of Upper_Bound
while I /= J then
Update (I);
Next (I):
end loop;
end;
> Generic_Update is excellent stuff. It does not apply in this particular case
> though, because Mneson links are immutable by design (can only be created or
> deleted, never changed). But there are a lot of element update situations in
> other applications, and so having Generic_Update is a great improvement from
> version 1.1 of the spec and corresponding reference implementation (that is
> the one currently used by Mneson, and that does not have
> Generic_Update--good thing that Mneson does not need them :-).
Generic_Update is equivalent to Generic_Element. The only difference is
that Generic_Update doesn't require the element to be aliased. It
provides no new functionality relative to the 1.1 spec.
****************************************************************
From: Marius Amado Alves
Sent: Thursday, May 13, 2004 12:13 PM
I see. I didn't need it for Mneson so it was there in the book but not in my
mind.
Anyway Generic_Update is better because it's pointerless :-)
****************************************************************
From: Tucker Taft
Sent: Thursday, May 13, 2004 10:33 AM
I guess I am still not convinced. If you use a binary
tree, having a cursor pointing into the tree is not
always terribly useful when you are trying to search
for some subsequent element with a given key. You will
often have to go back "up" several levels before being
able to go back down. With "Slice" you are forcing
every operation to support a "virtual" subset as well
as a real set. This is going to inevitably introduce
some distributed overhead. I would be surprised if on
balance, this is a net savings. I'm sure you could
construct a case where it would be a savings, but overall,
I would expect the mix of uses would favor keeping the
abstraction simpler.
An alternative is to have additional versions of operations
like "Find" and "Delete" which take a "Starting_With" cursor
parameter. (This may be something that was there to begin
with, I have forgotten.) Those might be useful, but still
they seem like operations that might sometimes be slower
than starting at the "top" of the binary tree, depending
on exactly where in the tree the Starting_With cursor points.
The added complexity to the interface just doesn't seem worth it.
There is certainly nothing preventing someone defining a
"Very_Ordered_Set" or whatever that has more of these operations,
making it closer to a "Vector" in interface. Or it could be
a generic child of Ordered_Set. I just don't
think the justification is there for our initial attempt at
a standard container library to include these additional
capabilities.
****************************************************************
From: Matthew Heaney
Sent: Thursday, May 13, 2004 10:59 AM
> An alternative is to have additional versions of operations
> like "Find" and "Delete" which take a "Starting_With" cursor
> parameter. (This may be something that was there to begin
> with, I have forgotten.) Those might be useful, but still
> they seem like operations that might sometimes be slower
> than starting at the "top" of the binary tree, depending
> on exactly where in the tree the Starting_With cursor points.
That's pretty much my feeling. It's hard to know apriori whether it's
faster to find an item by starting at the top and then searching the
tree, or starting from some point in the tree and then searching linearly.
Starting from the top does have the benefit that we can definitely say
that the time complexity is O(log n) even in the worst case, which is
why I re-wrote Mario's example to use a top-down search.
I agree that Slice, Open_Range, etc, aren't necessary. However, you
could make an argument for including an Upper_Bound style function,
since it's more efficient than the expression Next(Floor(S,K)), and
because it handles the endpoint issue automatically.
In fact I think the issue of endpoint is the more compelling argument
for including Upper_Bound (with some other name, of course), since even
trying to write Mario's example sans Upper_Bound required a bit of
mental effort. Maybe call it Next_Ceiling or Upper_Ceiling or whatever.
****************************************************************
From: Marius Amado Alves
Sent: Thursday, May 13, 2004 12:48 PM
Slice does not complicate the abstraction, on the contrary, cf. my example
about iterating a range.
I agree tree implementations might have trouble optimizing certain cases.
But in those cases they can just start with at the root as for a no-slice.
But, yes, there is a slight overhead even then, namely for detecting the
kind of case.
Skiplist and hashtable implementations might do better though.
But remember it's not just about optimization, it's also about expressing
ranges declaratively.
Just some final thoughts. By now I think I've made the case for Slice.
Personally as a user I'd like it there. But I might be too biased a user
(towards databases). I'm all confident you'll make the right choice. And as
you point out there is always space for an Ada.Containers.Optimized_Sets
package (*), which can mean business for independent Ada tool developers :-)
(*) Is there? I was under the impression that the RM ruled-out extensions to
package Ada. But the Ada.Containers spec talks about them as if they were
legal. Sorry for the newby question.
****************************************************************
From: Martin Dowie
Sent: Thursday, May 13, 2004 2:08 PM
> (*) Is there? I was under the impression that the RM ruled-out extensions
to
> package Ada. But the Ada.Containers spec talks about them as if they were
> legal. Sorry for the newby question.
I believe the rule is you can't add Child packages to package "Ada" but you
can add Grand-Child and extend existing Child packages.
****************************************************************
From: Marius Amado Alves
Sent: Thursday, May 13, 2004 1:25 PM
(Damn I said I was done but you keep asking for it :-)
> > An alternative is to have additional versions of operations
> > like "Find" and "Delete" which take a "Starting_With" cursor
> > parameter.
I fail to see how duplicating Insert, Delete, Is_In, Find, complicate the
interface less than simply adding Slice.
> > (This may be something that was there to begin
> > with, I have forgotten.)
There was, but only for Insert, and the known position add to be adjacent to
the new.
> > Those might be useful, but still
> > they seem like operations that might sometimes be slower
> > than starting at the "top" of the binary tree, depending
> > on exactly where in the tree the Starting_With cursor points.
>
> That's pretty much my feeling. It's hard to know apriori whether it's
> faster to find an item by starting at the top and then searching the
> tree, or starting from some point in the tree and then searching linearly.
This happens for either Slice or Starting_With. Actually Slice has more
information (the upper bound), which can help make a better decision.
> Starting from the top does have the benefit that we can definitely say
> that the time complexity is O(log n) even in the worst case, which is
> why I re-wrote Mario's example to use a top-down search.
Can you pinpoint please? (Not pressing.)
> I agree that Slice, Open_Range, etc, aren't necessary. However, you
> could make an argument for including an Upper_Bound style function,
> since it's more efficient than the expression Next(Floor(S,K)), and
> because it handles the endpoint issue automatically.
>
> In fact I think the issue of endpoint is the more compelling argument
> for including Upper_Bound (with some other name, of course), since even
> trying to write Mario's example sans Upper_Bound required a bit of
> mental effort.
Again, can you pinpoint please? (Not pressing.)
> Maybe call it Next_Ceiling or Upper_Ceiling or whatever.
I take it you don't like Roof :-(
****************************************************************
From: Randy Brukardt
Sent: Thursday, May 13, 2004 11:45 AM
> I guess I am still not convinced. If you use a binary
> tree, having a cursor pointing into the tree is not
> always terribly useful when you are trying to search
> for some subsequent element with a given key. You will
> often have to go back "up" several levels before being
> able to go back down. With "Slice" you are forcing
> every operation to support a "virtual" subset as well
> as a real set. This is going to inevitably introduce
> some distributed overhead. I would be surprised if on
> balance, this is a net savings. I'm sure you could
> construct a case where it would be a savings, but overall,
> I would expect the mix of uses would favor keeping the
> abstraction simpler.
I totally agree. Moreover, there is overhead from requiring every
implementation of Sets to support by-reference, not copied set objects.
(That is, the result of Slice). Moreover, you're introducing even more
erroneous cases into the library.
Matt will be happy to tell you how hard I tried to eliminate *all*
erroneousness from the containers library. He eventually convinced me that
some cases of dangling cursors cannot be detected (that is, those that point
into container objects that no longer exist). So some erroneousness is
inevitable; but I'm very opposed to having it where it is not required.
(Note that the erroneous cases come from the non-OOP design of the library.
If the container object was a parameter to all operations [as it ought to
be, IMHO], then there would be no need for erroneous cases. But that's water
under the dam. :-)
****************************************************************
From: Nick Roberts
Sent: Friday, May 14, 2004 2:43 PM
I am generally delighted by this amendment, and I hope it goes in. I think
it shows how the knocking together of many wise heads generally produces a
good result (even if it is only after an awful lot of argument :-)
It does seem clear to me that a comprehensive set of packages could easily
have numbered in the hundreds, when one considers the combinations of
different structures and the selection between bounded and unbounded,
definite and indefinite, and so on. I haven't counted, but Booch is over a
hundred isn't it?
I have a few queries. My profuse apologies if any of these have already been
addressed (and I've missed them).
[1] The vectors and maps are intended to automatically expand when required.
This is fine, but the interface seems to provide no control over this
expansion at all. Would it perhaps be a good idea to add a generic parameter
such as below?
Expansion_Size: Size_Type := [implementation defined];
The idea is that automatic expansion is done in multiples of Expansion_Size.
It has a default value, so that it can be conveniently ignored by the user.
A possible alternative is:
Expansion_Factor: Float := [implementation defined];
The idea here is that automatic expansion of a map or vector X is by
Size_Type(Expansion_Factor*Float(Size(X))). Again there is a convenient
default.
Alternatively, ExpansionSize/Factor could be made a visible discriminant of
the container types, or an invisible attribute (with appropriate get and set
operations).
[2] What was the reason for not permitting Resize to make a container
smaller, please?
[3] I'd quite like the amendment to add a paragraph near the top clarifying
the idea that every container has a set of 'slots', and that each slot can
be either empty or contain the (valid?) value of one element. The following
descriptions could, I think, be made slightly clearer and more succinct by
referring to these slots. (Would you like specific wording?)
[4] Regarding the optimisation of operations, I suggest it may be possible
for an implementation to keep enough extra internal information (in a Set
object) to enable it to detect and optimise various scenarios (judged to be
typical).
For example, assuming a tree structure, a pointer to the node above the
(terminal) node most recently inserted could be retained; the implementation
could test each insertion to see if it falls under this node; if a sequence
of insertions of (as it turns out) adjacent values occurs, this trick could
yield a very good speed improvement.
[5] Probably already mentioned, but in line 3364 'Assert (Target => V1,
Source => V2);' should be 'Assign (Target => V1, Source => V2);'.
Finally, is there a sample implementation of any these packages yet?
****************************************************************
From: Matthew Heaney
Sent: Thursday, May 13, 2004 3:05 PM
Nick Roberts wrote:
> I am generally delighted by this amendment, and I hope it goes in. I think
> it shows how the knocking together of many wise heads generally produces a
> good result (even if it is only after an awful lot of argument :-)
Most of the argument you didn't even see...
> It does seem clear to me that a comprehensive set of packages could easily
> have numbered in the hundreds, when one considers the combinations of
> different structures and the selection between bounded and unbounded,
> definite and indefinite, and so on. I haven't counted, but Booch is over a
> hundred isn't it?
Booch is large. But my original AI-302 proposal was large too: I think
were something like 25 containers (some of them had bounded and
unbounded forms, etc), and the proposal itself was about 150 pgs.
> I have a few queries. My profuse apologies if any of these have already been
> addressed (and I've missed them).
>
> [1] The vectors and maps are intended to automatically expand when required.
Yes.
> This is fine, but the interface seems to provide no control over this
> expansion at all.
No. That's what Resize is for.
> Would it perhaps be a good idea to add a generic parameter
> such as below?
>
> Expansion_Size: Size_Type := [implementation defined];
Use Resize.
> The idea is that automatic expansion is done in multiples of Expansion_Size.
> It has a default value, so that it can be conveniently ignored by the user.
> A possible alternative is:
>
> Expansion_Factor: Float := [implementation defined];
Use Resize to supply a hint about intended maximum length. The
implementation then resizes the container according to the algorithm the
vendor has chosen.
> The idea here is that automatic expansion of a map or vector X is by
> Size_Type(Expansion_Factor*Float(Size(X))). Again there is a convenient
> default.
In the AI-302 reference implementation, the array is automatically
expanded to twice its current size.
> Alternatively, ExpansionSize/Factor could be made a visible discriminant of
> the container types, or an invisible attribute (with appropriate get and set
> operations).
The container types do not have discriminants.
> [2] What was the reason for not permitting Resize to make a container
> smaller, please?
Make a copy of the container, Clear the original, and then Move the copy
to the original. (Wasn't this in the examples section?)
> [4] Regarding the optimisation of operations, I suggest it may be possible
> for an implementation to keep enough extra internal information (in a Set
> object) to enable it to detect and optimise various scenarios (judged to be
> typical).
>
> For example, assuming a tree structure, a pointer to the node above the
> (terminal) node most recently inserted could be retained; the implementation
> could test each insertion to see if it falls under this node; if a sequence
> of insertions of (as it turns out) adjacent values occurs, this trick could
> yield a very good speed improvement.
Earlier releases of the AI-302 draft had overloadings of Insert that had
a hint parameter, which, if it were successfully used to perform the
insertion, then the time complexity would be O(1) instead of O(log n).
However, the insert-with-hint operations were removed from the API at
the ARG meeting in Phoenix.
> Finally, is there a sample implementation of any these packages yet?
<http://charles.tigris.org/>
See the ai302 subdirectory.
The vector containers in the ai302 subdirectory conform to the most
recent AI-302 draft (dated 2004/04/29). Look for updates to the
remaining containers this weekend. (I recommend simply joining the
charles project mailing lists, so you get notified automatically.)
****************************************************************
From: Randy Brukardt
Sent: Friday, May 14, 2004 10:09 PM
> [1] The vectors and maps are intended to automatically expand when required.
> This is fine, but the interface seems to provide no control over this
> expansion at all.
That's intentional. The implementation is allowed to choose the expansion
algorithm that makes the most sense for it's architecture. Resize can be
used to tell the implementation the ultimate size; there is an AARM note to
mention to implementors that it is intended that this do the allocations
needed. Matt claims that Resize often can be used in practice (I'm
skeptical), but when it can't be used, you really don't have enough
information to choose at all.
> [2] What was the reason for not permitting Resize to make a container
> smaller, please?
The same reason that deleting an element doesn't necessarily destroy the
element. We wanted to give the implementation flexibility in using blocking,
caching, etc. The only operation that is guaranteed to recover space is the
destruction of the container.
Matt shows that it can be done by jumping through hoops, so there is a way
to do it in the rare case that it is needed.
> [3] I'd quite like the amendment to add a paragraph near the top clarifying
> the idea that every container has a set of 'slots', and that each slot can
> be either empty or contain the (valid?) value of one element. The following
> descriptions could, I think, be made slightly clearer and more succinct by
> referring to these slots. (Would you like specific wording?)
It's not necessary, and makes things read more like a description of a
specific implementation. We want as abstract a description as possible. We
spent quite a bit of effort getting rid of such wording from the vector and
maps containers (there should be no further reference to "nodes" in those
containers). I would have done the same to the other containers if I would
have had more time and energy.
> [5] Probably already mentioned, but in line 3364 'Assert (Target => V1,
> Source => V2);' should be 'Assign (Target => V1, Source => V2);'.
Yes, and I've fixed all of the typos noted by Dan and Christoph in the
working version -- so the ARG won't need to consider them in Palma.
****************************************************************
From: Matthew Heaney
Sent: Friday, May 14, 2004 11:25 PM
> Matt shows that it can be done by jumping through hoops, so
> there is a way to do it in the rare case that it is needed.
Just to add to what Randy said: the point of Resize is to prevent
automatic expansion that would otherwise occur as items are inserted
into the container. It's not influencing the size that's important per
se; rather, it's disabling expansion.
If you ever need to shrink a vector (say), then just do this:
Shrink:
declare
Temp : Vector := V;
begin
Clear (V);
Move (Target => V, Source => Temp);
end Shrink;
Note that I've been an STL user for 4 years now, and I've never actually
had a need to shrink a vector. Most of the time I use a vector to store
a large index or whatever, and usually I can determine prior to
insertion how many items I'm going to insert, so I call Resize first.
****************************************************************
From: Randy Brukardt
Sent: Friday, May 14, 2004 11:24 PM
I think you meant "Assign" rather than "Move", as Move just copies the existing
internal contents (thus preserving the size). "Assign" would make the target
only as large as necessary.
****************************************************************
From: Matthew Heaney
Sent: Saturday, May 15, 2004 1:54 AM
No, you've got it backwards. Move does indeed preserve the size -- of
the source. Here, Temp has the minimum size necessary to store the
Length (V) elements of V (although the API doesn't actually specify
this).
Note that Move doesn't copy any elements. The copying happened during
assignment of V to Temp.
Assign copies the active elements Source of onto the existing internal
array of Target, so it doesn't modify the size unless Length (Source) >
Size (Target).
****************************************************************
From: Nick Roberts
Sent: Saturday, May 15, 2004 8:39 AM
> > [2] What was the reason for not permitting Resize to make a container
> > smaller, please?
>
> The same reason that deleting an element doesn't necessarily destroy the
> element. We wanted to give the implementation flexibility in using blocking,
> caching, etc. The only operation that is guaranteed to recover space is the
> destruction of the container.
Well, it may seem like nitpicking, but that seems to be a reason to /allow/
the implementation /not/ to (actually) shrink a container. It doesn't seem
like a reason to /disallow/ the implementation from shrinking it. Surely
allowing an implementation to shrink if it wishes would be provide the
greatest flexibility?
I suspect, with respect, that you are being a bit hopeful if you expect
implementations to use blocking, caching, or other optimisations. I doubt
that many will, in practice. And with an implementation close to the model,
there would be no difficulty in shrinking (by reallocation and copying, as
for enlargement). Actually, I think shrinking would probably be feasible for
most implementations, maybe all.
Again, I guess that's arguing the case as strongly as it can be.
> > [3] I'd quite like the amendment to add a paragraph near the top
> > clarifying the idea that every container has a set of 'slots', and that
> > each slot can be either empty or contain the (valid?) value of one
> > element. The following descriptions could, I think, be made slightly
> > clearer and more succinct by referring to these slots. (Would you
> > like specific wording?)
>
> It's not necessary, and makes things read more like a description of a
> specific implementation. We want as abstract a description as possible. We
> spent quite a bit of effort getting rid of such wording from the vector and
> maps containers (there should be no further reference to "nodes" in those
> containers). I would have done the same to the other containers if I would
> have had more time and energy.
Hmm. Well, I intended the 'slot' to be an abstract (model) concept, and you
could even say that in the description. I do really think it could
significantly clarify the descriptions. I could do some actual wording, if
you wish.
****************************************************************
From: Nick Roberts
Sent: Saturday, May 15, 2004 8:39 AM
> > [1] The vectors and maps are intended to automatically expand when required.
> > This is fine, but the interface seems to provide no control over this
> > expansion at all.
> > Would it perhaps be a good idea to add a generic parameter
> > such as below?
> >
> > Expansion_Size: Size_Type := [implementation defined];
> > Expansion_Factor: Float := [implementation defined];
>
> Use Resize to supply a hint about intended maximum length. The
> implementation then resizes the container according to the algorithm the
> vendor has chosen.
> In the AI-302 reference implementation, the array is automatically
> expanded to twice its current size.
This seems to correspond with the idea of having something like:
Expansion_Factor: Float := 2.0;
as a generic parameter.
Such a parameter would not interfere with the use of Resize, wherever the
user could or wanted to use it (and which would certainly be superior where
it could be used). However, it would provide a small extra measure of
control for the user.
An implementation could partially or entirely ignore the value of
Expansion_Factor, if there were better criteria for it to base the decision
on. Since it has a default value, it does not get in the way of the user who
doesn't want to use it.
I don't think its addition would add much complexity to the specifications,
or much burden to implementations. It would actually simplify some
implementations, wouldn't it?.
I seem to remember that, back in the days when computers (operating systems)
had fixed-length files on their hard disks, you could usually specify an
expansion size for a file. A file would be automatically reallocated,
expanded by it expansion size, when necessary (just like a vector in the AI,
curiously).
Okay, I think I've argued the case for this feature as strongly as possible
now :-)
> > [2] What was the reason for not permitting Resize to make a container
> > smaller, please?
>
> Make a copy of the container, Clear the original, and then Move the copy
> to the original. (Wasn't this in the examples section?)
Yes, but that doesn't answer my question, Matt!
> > Finally, is there a sample implementation of any these packages yet?
>
> <http://charles.tigris.org/>
>
> See the ai302 subdirectory.
>
> The vector containers in the ai302 subdirectory conform to the most
> recent AI-302 draft (dated 2004/04/29). Look for updates to the
> remaining containers this weekend. (I recommend simply joining the
> charles project mailing lists, so you get notified automatically.)
Great. Thanks.
****************************************************************
From: Ehud Lamm
Sent: Sunday, May 16, 2004 4:59 AM
> An implementation could partially or entirely ignore the value of
> Expansion_Factor, if there were better criteria for it to base the decision
> on. Since it has a default value, it does not get in the way of the user who
> doesn't want to use it.
This makes sense to me. That's the way I usually do it.
****************************************************************
From: Nick Roberts
Sent: Saturday, May 15, 2004 8:48 AM
> > Matt shows that it can be done by jumping through hoops, so
> > there is a way to do it in the rare case that it is needed.
>
> Just to add to what Randy said: the point of Resize is to prevent
> automatic expansion that would otherwise occur as items are inserted
> into the container. It's not influencing the size that's important per
> se; rather, it's disabling expansion.
>
...
Okay, but it would be way easier to be able to use one call to Resize
instead!
> Note that I've been an STL user for 4 years now, and I've never actually
> had a need to shrink a vector. Most of the time I use a vector to store
> a large index or whatever, and usually I can determine prior to
> insertion how many items I'm going to insert, so I call Resize first.
Hmm. I think perhaps what you're missing is the case where: (a) you don't
know in advance what size is going to be required; (b) you want to Resize
the vector to something big, so as to minimise (eliminate) reallocations. I
think this is a fairly common scenario. In this kind of case, the user knows
the length of the vector after it has been populated, and would probably
like to be able to issue a simple Resize afterwards to change the size of
the vector to its length (eliminating wasted space). E.g.:
Open(File,...);
Resize(Vector,100_000);
while not End_of_File(File) loop
Read(File,X);
Append(Vector,X);
end loop;
Close(File);
Resize(Vector,Length(Vector));
Does this not make sense?
****************************************************************
From: Matthew Heaney
Sent: Saturday, May 15, 2004 11:40 AM
> Okay, but it would be way easier to be able to use one call
> to Resize instead!
Right now Resize has the same semantics as reserve() does in the STL.
You might want to post a note on comp.lang.c++ asking about reserve()
(and its associated function capacity()). You might want also want to
send your question to Musser, Plauger, or Scott Meyers to get their
opinion.
> Hmm. I think perhaps what you're missing is the case where:
> (a) you don't know in advance what size is going to be
> required; (b) you want to Resize the vector to something big,
> so as to minimise (eliminate) reallocations. I think this is
> a fairly common scenario.
In that case I would use a std::deque, not a std::vector, if the number
of elements is large and I need population of the container to be as
fast as possible.
(I had included a deque container in my original proposal, but removed
it after the ARG asked me to reduce its size. We should revisit this if
there's ever a secondary container library standard.)
>In this kind of case, the user
> knows the length of the vector after it has been populated,
> and would probably like to be able to issue a simple Resize
> afterwards to change the size of the vector to its length
> (eliminating wasted space). E.g.:
>
> Open(File,...);
> Resize(V,100_000);
> while not End_of_File(File) loop
> Read(File,X);
> Append(V,X);
> end loop;
> Close(File);
> Resize(V,Length(V));
>
> Does this not make sense?
Read the file into a temporary vector (which has been resized as above),
and then assign it to the real vector V.
The moral of the story is that you can shrink a vector. We're only
disagreeing about the syntax.
(Note that what I do mostly involves .avi files, which have a header
describing how many frames are in the file. So in my case I read in the
avi stream header first, and then resize the vector based on the
information in the header.)
****************************************************************
From: Nick Roberts
Sent: Monday, May 17, 2004 2:47 PM
"Matthew Heaney" <matthewjheaney@earthlink.net> wrote:
> > Okay, but it would be way easier to be able to use one call
> > to Resize instead!
>
> Right now Resize has the same semantics as reserve() does in the STL.
> You might want to post a note on comp.lang.c++ asking about reserve()
> (and its associated function capacity()). You might want also want to
> send your question to Musser, Plauger, or Scott Meyers to get their
> opinion.
I must say, that seems like a very evasive answer. Can you not give a direct
answer to the question "Why not permit Resize to reduce the size of a
vector?" Why does the Ada standard need to do what the STL does?
> > Hmm. I think perhaps what you're missing is the case where:
> > (a) you don't know in advance what size is going to be
> > required; (b) you want to Resize the vector to something big,
> > so as to minimise (eliminate) reallocations. I think this is
> > a fairly common scenario.
>
> In that case I would use a std::deque, not a std::vector, if the number
> of elements is large and I need population of the container to be as
> fast as possible.
>
> (I had included a deque container in my original proposal, but removed
> it after the ARG asked me to reduce its size. We should revisit this if
> there's ever a secondary container library standard.)
In which case, I must ask what is the point of providing the vector
abstraction at all? What does it provide that is not bettered, in practice,
either by Ada's instrinsic arrays or by the list abstraction?
> ...
> The moral of the story is that you can shrink a vector. We're only
> disagreeing about the syntax.
Yes, we are disagreeing about the syntax. I am suggesting that the syntax:
Resize(V,Length(V));
is a big improvement upon:
declare
Temp : Vector := V;
begin
Clear (V);
Move (Target => V, Source => Temp);
end Shrink;
and I do not see -- and I have not been given -- any reason why the former
should not be permitted.
> (Note that what I do mostly involves .avi files, which have a header
> describing how many frames are in the file. So in my case I read in the
> avi stream header first, and then resize the vector based on the
> information in the header.)
In which case, why do you not simply use an array?
****************************************************************
From: Randy Brukardt
Sent: Monday, May 17, 2004 5:07 PM
> In which case, I must ask what is the point of providing the vector
> abstraction at all? What does it provide that is not bettered, in
practice,
> either by Ada's intrinsic arrays or by the list abstraction?
Because Matt is very tied (in his mind) to a particular implementation. The
containers as described in AI-302-03 are much more abstract, and do not have
a prescribed implementation. Janus/Ada will probably use a two-level
implementation for vector (which is more like what Matt calls a "Deque"),
because the extra cost of such an implementation is quite low in return for
the benefits that will be available. (It also maps much better to the
code-shared generics of Janus/Ada).
...
> > (Note that what I do mostly involves .avi files, which have a header
> > describing how many frames are in the file. So in my case I read in the
> > avi stream header first, and then resize the vector based on the
> > information in the header.)
>
> In which case, why do you not simply use an array?
I've made this point many times, and you're never going to get a
satisfactory answer. It's best to let it go. (Otherwise, we'll use up the
entire budget for Ada 2005 discussing trivialities, and there will not be
any money to build the RM...)
Earlier, Nick wrote:
> This seems to correspond with the idea of having something like:
> Expansion_Factor: Float := 2.0;
> as a generic parameter.
This is very specific to a particular implementation. We don't want that
much specification of the implementation.
...
> An implementation could partially or entirely ignore the value of
> Expansion_Factor, if there were better criteria for it to base the
decision
> on. Since it has a default value, it does not get in the way of the user
who
> doesn't want to use it.
We don't want a parameter whose value can be ignored. Resize itself is bad
enough.
In any event, micro-managing memory use is not what containers are about.
You use them when you want the system to manage memory for you. If you care
deeply about memory use, you need to build your own abstractions. If you
don't, a compiler update could completely destroy your system's performance.
(You can't rely on predefined stuff for critical time/space performance.)
...
> I suspect, with respect, that you are being a bit hopeful if you expect
> implementations to use blocking, caching, or other optimisations. I doubt
> that many will, in practice. And with an implementation close to the
model,
> there would be no difficulty in shrinking (by reallocation and copying, as
> for enlargement). Actually, I think shrinking would probably be feasible
for
> most implementations, maybe all.
IBM Rational insisted on weakening some of the requirements so that they
could use alternative implementations. Similarly, I've been very concerned
about specifying an implementation, simply because Matt's implementations
would be outrageously slow if compiled for Janus/Ada (due to generic code
sharing). I fully intend to use a two-level scheme for vectors. All of the
containers will use limited free lists to avoid excess allocation. I've
considered allocation blocking for lists (but it wouldn't work for
Janus/Ada, so we won't do that). Now, some vendors may simply use Matt's
implementations, but it's pretty clear that at least some vendors are not
planning to do so.
> Hmm. Well, I intended the 'slot' to be an abstract (model) concept, and
you
> could even say that in the description. I do really think it could
> significantly clarify the descriptions. I could do some actual wording, if
> you wish.
But we don't need it! Containers just hold a number of elements; all else is
specific to particular implementations, and does not really belong in the
standard. We made a number of specific exceptions to that to allow inserting
of empty elements into vectors for performance reasons (similar to the
reason that Resize exists). Those do not need any wrapper concept.
As I said, Matt's original text had "nodes" in many places, and I took them
out as much as possible. It generally shortened the wording; there were no
cases where it helped anything. (It's more useful in the List container, but
even there, it would be best to remove it. Just no more energy or budget.)
And no thanks, I don't have any energy or budget to spend training someone
how to write Standard wording. Especially when it isn't necessary. I don't
doubt that there exist paragraphs that need wordsmithing, but I think the
overall wording is about on the right level.
****************************************************************
From: Pascal Obry
Sent: Tuesday, May 18, 2004 12:55 AM
> Because Matt is very tied (in his mind) to a particular implementation. The
> containers as described in AI-302-03 are much more abstract, and do not have
> a prescribed implementation.
This is not true for the map. The name is Indefinite_Hashed_Maps. This state
cleary that the implementation uses an hash table. I have found that if an
hash table is very fast for small set of data (< 10000) it is quite slower
than an AVL for very large set of data (> 100_000). Maybe this is the current
reference implementation but that's what I have experienced. FYI, the AVL
implementation I'm talking about is Table_Of_*_And_Dynamic_Data_G from the
LGL.
****************************************************************
From: Randy Brukardt
Sent: Friday, May 19, 2004 7:14 PM
The "Hashed_Maps" and "Ordered_Sets" cases are special. I think everyone would
have preferred to avoid specifying an implementation there as well. But that's
impossible, because of the vastly different generic parameters needed. That is,
a "Hashed_Map" takes a hash function as a generic parameter, while an
"Ordered_Set" (implemented as a tree) takes generic ordering operators as
generic parameters. So, that exposes the basic implementation, as does any
ordering requirements (hash tables aren't ordered by definition). Given these
basic properties differ, a container where the hash vs. tree implementation
isn't specified doesn't make sense.
I do have to wonder about your results. Since an AVL tree is going to be log N
access by key, it should be quite a bit slower in large collections. The only
reason for a hash table to slow down is a bad hash function (which then could
make long chains in a few buckets) - essentially turning lookups into brute
force searches. Are you sure that your hash function is good enough for "large
sets of data"? An ideal function would put one item into each bucket.
****************************************************************
From: Pascal Obry
Sent: Saturday, May 22, 2004 2:04 AM
The hash routine was not good at all. We have discussed this with Matthew,
using a standard one (close to a hash routine used to implement associative
arrays in Tcl or Gawk) the hash table is now 2 times faster.
****************************************************************
From: Michael F. Yoder
Sent: Saturday, May 22, 2004 11:40 AM
I've seen bad behavior with hashing many times, both in personal and
professional contexts. The basic reason is: if you use a fixed table
size and linear chaining within a bucket, hashing is linear (albeit with
a small constant) and large datasets can perform very badly even if the
hash function is good. I don't recall the problem ever being a bad hash
function, though it could have occurred and I've forgotten.
My own solution was to expand the table size when it becomes 3/4 full or
so (using internal rather than external chaining); it might be better to
make each bucket be a tree. The latter solution has a security benefit:
it mitigates DOS attacks based on causing collisions deliberately. This
consideration occurred at my last job, but admittedly isn't a common
one. For what it's worth, the use of an expanding table has always
solved the problem.
****************************************************************
From: Tucker Taft
Sent: Saturday, May 22, 2004 3:42 PM
The Hash_Maps are intended to be expandable hash tables.
That's what Resize() is all about. And yes, I expect the only
reason AVLs might start to outperform a hash table is if the
hash table has a fixed number of buckets.
****************************************************************
From: Randy Brukardt
Sent: Saturday, May 22, 2004 3:46 PM
The containers library uses an expanding hash table. The only way the
behavior can get bad is if the hash function isn't good enough to use most
of the buckets.
****************************************************************
From: Ehud Lamm
Sent: Sunday, May 16, 2004 4:54 AM
> Tucker wrote:
>
> > I guess I am still not convinced. If you use a binary
> > tree, having a cursor pointing into the tree is not
> > always terribly useful when you are trying to search
> > for some subsequent element with a given key. You will
> > often have to go back "up" several levels before being
> > able to go back down. With "Slice" you are forcing
> > every operation to support a "virtual" subset as well
> > as a real set. This is going to inevitably introduce
> > some distributed overhead. I would be surprised if on
> > balance, this is a net savings. I'm sure you could
> > construct a case where it would be a savings, but overall,
> > I would expect the mix of uses would favor keeping the
> > abstraction simpler.
>
> I totally agree. Moreover, there is overhead from requiring every
> implementation of Sets to support by-reference, not copied
> set objects. (That is, the result of Slice).
This is also the way I see it.
Perhaps I missed something, so let me put it bluntly: are we talking ADT
interfaces here, or are we working solely for a specific implementation?
As you know from our Ada-Europe workshop a couple of years ago, I am firmly
in the ADT camp myself, so I prefer interfaces that don't impose to many
implementation restrictions. They an then be extended at will -- much easier
than removing operations that are hard of inefficient to support.
****************************************************************
From: Marius Amado Alves
Sent: Monday, May 17, 2004 1:01 PM
[Slice et al.]
> Perhaps I missed something, so let me put it bluntly: are we talking ADT
> interfaces here, or are we working solely for a specific implementation?
Both. Slice provides a way to express ranges declaratively (interface) and a
way to pass information to operations that can use it to optimize
(implementation, but not specific).
(Just clarifying. The cases have been made, the tendency of the ARG is to
leave Slice out, so it's only academic now.)
****************************************************************
From: Ehud Lamm
Sent: Sunday, May 16, 2004 5:03 AM
> From: Matthew Heaney [mailto:mheaney@on2.com]
>
> Tucker Taft wrote:
>
> > I don't think we need to change
> > "Previous" to make these equivalences work for
> > endpoints. Just let the user write a
> > "Previous_Or_Last" if they really want to,
> > which would need to take both a cursor and a set.
> > Or more directly, write Lower_Limit or Upper_Limit
> > if you want them, since these already have enough
> > information with the set and the key.
> >
> > Providing Ceiling and Floor still seems adequate to me,
> > as they provide the needed primitives for all other
> > operations mentioned thus far.
>
> OK. That seems reasonable. I just wanted to make sure we
> were on the
> same page w.r.t the behavior at the endpoints.
It does seem reasonable, and since I never used this sort of operations, my
opinion shouldn't count as much, so take this with a grain of salt...
It looks like the equivalences help understand what's going on. The special
cases make code less readable and the logic a bit less clear. How important
this is, is hard to judge.
I wager many students will forget about the special case. Why not provide
Lower_Limit or Upper_Limit? The cost seems tiny.
****************************************************************
From: Matthew Heaney
Sent: Monday, May 17, 2004 1:15 PM
I am in favor of providing the following four operations:
Lower_Limit (S, K) < K (AKA "Ground", "Previous_Floor")
Floor (S, K) <= K
Ceiling (S, K) >= K (AKA Lower_Bound)
Upper_Limit (S, K) > K (AKA Upper_Bound, "Roof", "Next_Ceiling")
I think Tucker only wants the middle two.
If I had to pick only two, then I'd pick the last two (Ceiling and
Upper_Limit). (This is what the STL & Charles do, and what was in the
API prior to the ARG meeting in Phoenix.)
Note that there are really two separate issues:
(1) What is the value of the expression:
Previous (Next (C))
We got rid of the internal sentinel node in Phoenix, which means once a
cursor assumes the value No_Element, then it keeps that value.
This is what Tucker and I were discussing in the earlier message quoted
above, about letting a user define a Previous_or_Last function if he
needs to back up onto the actual sequence.
(2) Restoring the functionality of the two operations formerly known as
"Lower_Bound" and "Upper_Bound".
There seems to be agreement that this functionality is useful. One of
the issues is that several of the ARG reviewers were confused by the
names "Lower_Bound" and "Upper_Bound".
****************************************************************
From: Tucker Taft
Sent: Monday, May 17, 2004 1:49 PM
Will this never end? ;-)
My *major* complaint with Upper_Limit, Lower_Limit,
Upper_Bound, Lower_Bound, etc. is that the names
make no intuitive sense.
If you could come up with some reasonable names,
I might support the inclusion. I do not find
any of the ones that have been proposed thus far
acceptable.
Predecessor and Successor might make it, where they
are allowed to take a key that might or might
not appear in the set, and return the cursor for
the item in the set next preceding or following the given key.
****************************************************************
From: Michael F. Yoder
Sent: Wednesday, May 19, 2004 11:48 AM
Whether 2 or 4 operations are included, it would be pleasant if the
names came from a consistent scheme. For example:
Lt_Item (S, K) < K
Le_Item (S, K) <= K
Gt_Item (S, K) > K
Ge_Item (S, K) >= K
This is easier to do if the "Lt" and "Gt" operations are the only two
provided. For example, 'Predecessor' and 'Successor' would be fine.
Floor for Le_Item and Ceiling for Ge_Item, together with Predecessor and
Successor, would be acceptable.
****************************************************************
From: Christoph Grein
Sent: Sunday, May 23, 2004 11:37 PM
I do think the names at the right intuitively describe the meaning:
Gt_Item (S, K) > K Roof
Ge_Item (S, K) >= K Ceiling
Le_Item (S, K) <= K Floor
Lt_Item (S, K) < K Ground, Basement
It's like a building, you're in a room, which has a floor and a ceiling; above
is the roof (or the attic), below the basement or ground.
****************************************************************
From: Marius Amado Alves
Sent: Monday, May 24, 2004 5:26 AM
":=" for containers clones the source (as opposed to passing a reference
to).
Do I understand correctly that this behaviour is specified solely by the
fact that containers are non-limited?
In that case, wouldn't a small clarifying Note by useful, specially for new
users coming e.g. from... uh... Java...
And should't the behaviour of ":=" be documented for any controlled type
anyway?
****************************************************************
From: Matthew Heaney
Sent: Wednesday, June 9, 2004 9:40 AM
I have a few comments on the Phoenix release of AI-302 (2004-04-29
AI95-00302-03/03). Each comment is bracketed with "MJH:" and "ENDMJH."
pairs, and immediately follows the item to which it refers.
-Matt
A.17.2 The Package Containers.Vectors
generic
...
package Ada.Containers.Vectors is
...
function To_Vector (Count : Size_Type) return Vector;
MJH:
I wasn't absolutely sure whether the formal param should be named
"Count" or "Length". The term "count" is used elsewhere in this spec,
but here it actually specifies the length of the vector object returned
by the function.
ENDMJH.
function To_Vector
(New_Item : Element_Type;
Count : Size_Type) return Vector;
MJH:
This is formatted inconsistently. It should be:
function To_Vector (New_Item : Element_Type;
Count : Size_Type)
return Vector;
ENDMJH.
procedure Set_Length (Container : in out Vector;
Length : in Size_Type);
MJH:
Should we include following operation too?
procedure Set_Length (Container : in out Container_Type;
Length : in Size_Type;
New_Item : in Element_Type);
This would allow the user to specify an actual value for the new
elements, if the length of the vector is increased.
ENDMJH.
procedure Swap (Container : in out Vector;
I, J : in Cursor);
MJH:
Should be weaken the precondition, allowing the case in which both I and
J have the value No_Element? In that case Swap would be a no-op.
(Right now I think it's an error.)
ENDMJH.
function To_Index (Position : Cursor) return Index_Type'Base;
If Position is No_Element, Constraint_Error is propagated. Otherwise, the
index (within its containing vector) of the element designated by Cursor is
returned.
MJH:
Should this be reworded to say "If Has_Element (Position) is False..."?
ENDMJH.
MJH:
Also, note that if Position may only designate an active element in the
container, then we don't need to return Index_Type'Base. We can
strengthen the post-condition by returning Index_Type.
ENDMJH.
AARM Note: This implies that the index is determinable from a bare cursor
alone. The basic model is that a vector cursor is implemented as a record
containing an access to the vector container and a index value. This does
constrain implementations, but it also allows all of the cursor operations
to be defined in terms of the corresponding index operation (which should be
primary for a vector).
MJH:
It's not clear if CE is supposed to be propagated if Position does not
specify a value within the range of currently active elements of
Container. For example:
declare
V : Vector;
C : Cursor;
I : Index_Type'Base;
begin
Append (V, E);
C := First (V);
Delete_First (V);
I := To_Index (C); --valid?
end;
ENDMJH.
generic
with procedure Process (Element : in out Element_Type) is <>;
procedure Generic_Update_by_Index (Container : in Vector;
Index : in Index_Type'Base);
If Index is not in the range First_Index (Container) .. Last_Index
(Container),
then Constraint_Error is propagated. Otherwise, it calls the generic actual
bound to Process with the element at position Index as the parameter. Any
exceptions raised by Process are propagated.
If Element_Type is unconstrained and definite, then the Element parameter
shall be unconstrained.
AARM Note: This means that the elements cannot be aliased nor directly
allocated from the heap; it must be possible to change the discriminants
of the element in place.
The element at position Index is not an empty element after successful
completion of this operation.
AARM Note: Since reading an empty element is a bounded error, attempting to
use this procedure to replace empty elements may fail. Use Replace_Element
to do that reliably.
MJH:
What did we conclude about this? I thought using Generic_Update to
initialize a space element was ok? (Or was that only for a list?)
Is this AARM Note in conflict with the note below?
ENDMJH.
procedure Replace_Element (Position : in Cursor;
By : in Element_Type);
This function assigns the value By to the element designated by Position.
If Position equals No_Element, then Constraint_Error is propagated.
Any exceptions raised during the assignment are propagated. The element
designated by Position is not an empty element after successful
completion of
this operation.
AARM Note: Replace_Element, Generic_Update, and Generic_Update_by_Index are
only ways that an element can change from empty to non-empty.
MJH:
Is this AARM Note in conflict with the note above?
ENDMJH.
procedure Insert (Container : in out Vector;
Before : in Cursor;
New_Item : in Vector);
If Before is No_Element, then is equivalent to Insert (Container,
Index_Type'Succ (Last_Index (Container)), New_Item); otherwise is
equivalent to Insert (Container, To_Index (Before), New_Item);
MJH:
Should this be reworded to say "Has_Element (Before) = False..." instead?
ENDMJH.
MJH:
We probably need to say here that if New_Item is empty, then then
operation has no effect. Otherwise there's a constraint check if
Before=No_Element (IT'Succ (Cont.Last)) can fail, when Cont.Last=IT'Last).
ENDMJH.
MJH:
Here and elsewhere the equivalence is in terms of To_Index, but this
might be too restrictive. Before is allowed to be IT'Succ (Cont.Last),
but I think To_Index raises an exception if it has that value.
ENDMJH.
procedure Insert (Container : in out Vector;
Before : in Cursor;
New_Item : in Vector;
Position : out Cursor);
Create a temporary (call it Temp_Index) and set it to Index_Type'Succ
(Last_Index (Container)) if Before equals No_Element, and To_Index (Before)
otherwise. Then Insert (Container, Before, New_Item) is called, and finally
Position is set to To_Cursor (Container, Temp_Index).
AARM Note: The messy wording because Before is invalidated by Insert, and we
don't want Position to be invalid after this call. An implementation
probably
only needs to copy Before to Position.
MJH:
See note above.
ENDMJH.
procedure Insert (Container : in out Vector;
Before : in Cursor;
New_Item : in Element_Type;
Count : in Size_Type := 1);
Equivalent to Insert (Container, Before, To_Vector (New_Item, Count));
MJH:
See note above when Count = 0. (We should state explicitly that if
Count=0, then the operation is a no-op, and there are no constraint
checks or any other exceptions. The value or state of cursor Before is
not checked or otherwise considered, when Count=0.)
ENDMJH.
procedure Insert (Container : in out Vector;
Before : in Cursor;
New_Item : in Element_Type;
Position : out Cursor;
Count : in Size_Type := 1);
Equivalent to Insert (Container, Before, To_Vector (New_Item, Count),
Position);
MJH:
See not above re count=0.
ENDMJH.
procedure Prepend (Container : in out Vector;
New_Item : in Vector;
Count : in Size_Type := 1);
Equivalent to Insert (Container, Index_Type'First, New_Item).
MJH:
Typo: this declaration should look like this:
procedure Prepend (Container : in out Vector;
New_Item : in Vector);
ENDMJH.
procedure Insert_Space (Container : in out Vector;
Before : in Cursor;
Position : out Cursor;
Count : in Size_Type := 1);
Create a temporary (call it Temp_Index) and set it to
Index_Type'Succ (Last_Index (Container)) if Before equals No_Element, and
To_Index (Before) otherwise. Then Insert_Space (Container, Temp_Index,
Count) is called, and finally Position is set to To_Cursor (Container,
Temp_Index).
MJH:
See note above re count=0.
ENDMJH.
procedure Delete (Container : in out Vector;
Position : in out Cursor;
Count : in Size_Type := 1);
If Count is 0, the operation has no effect. Otherwise is equivalent to
Delete (Container, To_Index (Position), Count).
MJH:
Here and elsewhere when Count is 0, I think we need to specify what
value for Position is returned.
ENDMJH.
MJH:
If Count is non-zero, then how should we handle a Position that does
not designate an active element. Above, we raise CE. Is this correct?
ENDMJH.
MJH:
We probably need to say here that Position is set to Position.Index if
Index continues to designate an element in the container, or No_Element
if Position was part of the entire tail that was deleted.
ENDMJH.
procedure Delete_Last (Container : in out Vector;
Count : in Size_Type := 1);
If Length (Container) < Count then is equivalent to
Delete (Container, Index_Type'First, Count); otherwise
is equivalent to Delete (Container,
Index_Type'Val(Index_Type'Pos(Last_Index(Container)) - Count + 1), Count).
MJH:
If Length (C) >= Count, then isn't it easier to simply say that it's
the same as Clear (C)?
ENDMJH.
Returns the value Index_Type'First.
MJH:
What operation does this description refer to? I assume it's First_Index.
ENDMJH.
procedure Swap (Container : in Vector;
I, J : in Cursor);
Equivalent to Swap (Container, To_Index (I), To_Index (J)).
MJH:
I mentioned this above. We might want to weaken the precondition of
Swap, to allow cursors both of which Has_Element returns False to be
swapped; that is, if both are No_Element, then Swap should be a no-op.
ENDMJH.
function Find (Container : Vector;
Item : Element_Type;
Index : Index_Type'Base := Index_Type'First)
return Index_Type'Base;
Searches the elements of Container for an element equal to Item,
starting at position Index. If Index is less than Index_Type'First,
then Constraint_Error is propagated. If there are no elements in the
range Index .. Last_Index (Container) equal to Item, then Find returns
Index_Type'Succ (Last_Index (Container)). Otherwise, it returns the index of
he matching element.
MJH:
Here and in the other find ops we should probably weaken the precondition,
such that if the container is empty, we return failure status
immediately, without vetting or otherwise interrogating the value of Index.
ENDMJH.
function Find (Container : Vector;
Item : Element_Type;
Position : Cursor := No_Element)
return Cursor;
Searches the elements of Container for an element equal to Item,
starting at the first element if Cursor equals No_Element, and at
the element designated by Cursor otherwise, and searching to the last
element in Container. If an item equal to Item is found, Find returns a
cursor designating the first element found equal to Item. If no such item is
found, it returns No_Element.
MJH:
Suppose Has_ELement (Position) = False, is this an error (raise CE), or
does it count as No_ELement (start from IT'First)?
NDMJH.
A.17.3 The Package Containers.Doubly_Linked_Lists
procedure Delete (Container : in out List;
Position : in out Cursor;
Count : in Size_Type := 1);
If Position equals No_Element, the operation has no effect. Otherwise
Delete removes Count nodes starting at the node designated by Position
from Container (or all of the nodes if there are less than Count nodes
starting at Position). Any exceptions raised during deallocation of internal
storage are propagated.
MJH:
Is this inconsistent with vector? I think we made it an error if
Size > 0 and Position = No_Element. (I don't know which way we should
go, I just wanted to bring it up.)
ENDMJH.
procedure Swap (Container : in out List;
I, J : in Cursor);
Swap exchanges the nodes designated by I and J.
MJH:
Allow I and J to both assume the value No_Element?
ENDMJH.
MJH:
Does this swap nodes (by exchanging pointers, or does it
eave the nodes in their relative positions, and merely
exchange the values of the elements on those nodes?
ENDMJH.
A.17.5 The Package Containers.Ordered_Sets
generic
...
package Ada.Containers.Ordered_Sets is
...
procedure Insert (Container : in out Set;
New_Item : in Element_Type;
Position : out Cursor;
Success : out Boolean);
--MJH:
--A nice function might be:
--procedure Insert (Container : in out Set;
-- New_Item : in Element_Type);
--This is a convenience function that omits the last two params.
--ENDMJH.
function Is_Subset (Item : Set;
Container : Set)
return Boolean;
MJH:
Clarify the results when one or both of the params are empty sets. (I
assume that in set theory, the subset operation is defined on a pair
ull sets, but I don't remember offhand what the value is.)
ENDMJH.
function Is_Disjoint (Item : Set;
Container : Set)
return Boolean;
MJH:
As above, clarify the results when one or both of the params are empty sets.
ENDMJH.
****************************************************************
From: Randy Brukardt
Sent: Wednesday, June 9, 2004 11:59 PM
A couple of comments on Matt's comments (I'm not going to comment on typos
and the like, it's too late to fix them before the meeting, and they're
recorded).
> function To_Index (Position : Cursor) return Index_Type'Base;
>
> If Position is No_Element, Constraint_Error is propagated. Otherwise, the
> index (within its containing vector) of the element designated by
> Cursor is
> returned.
>
> MJH:
> Should this be reworded to say "If Has_Element (Position) is False..."?
> ENDMJH.
I don't think so. It's usually a bounded error to use a cursor that doesn't
point at an active element. That allows either raising Constraint_Error or
doing something else. You explain why below...
...
> MJH:
> It's not clear if CE is supposed to be propagated if Position does not
> specify a value within the range of currently active elements of
> Container. For example:
>
> declare
> V : Vector;
> C : Cursor;
> I : Index_Type'Base;
> begin
> Append (V, E);
> C := First (V);
> Delete_First (V);
> I := To_Index (C); --valid?
> end;
> ENDMJH.
It's very clear that this is a bounded error, and we're *not* requiring
implementations to detect this case (in this specific example, because
Delete is called on an element to the left). But we *allow* it to be
detected. I thought we had agreed that we didn't want the overhead of
detecting these kinds of errors.
The organization of the standard requires us to put the bounded error text
far away from this subprogram (which is unfortunate), but since it is a
general rule, that isn't too bad.
The bounded error rules apply to *all* uses of cursors except Has_Element,
so the answer is the same for all other routines.
> generic
> with procedure Process (Element : in out Element_Type) is <>;
> procedure Generic_Update_by_Index (Container : in Vector;
> Index : in Index_Type'Base);
...
> MJH:
> What did we conclude about this? I thought using Generic_Update to
> initialize a space element was ok? (Or was that only for a list?)
It's also in the bounded error section. I think we concluded that we
couldn't allow Generic_Update, because it implies a read of the element. I
tried to find a way to avoid that, but if we did, then it wouldn't be
"Update" any more.
...
> AARM Note: Replace_Element, Generic_Update, and
> Generic_Update_by_Index are
> only ways that an element can change from empty to non-empty.
>
> MJH:
> Is this AARM Note in conflict with the note above?
> ENDMJH.
Someone asked that in April. Sheesh. Generic_Update is in the list because
it's a bounded error to call it, and *if* it doesn't raise an exception,
*then* it changes the element to non-empty. But you can't depend that it
doesn't raise an exception.
...
> procedure Delete (Container : in out List;
> Position : in out Cursor;
> Count : in Size_Type := 1);
>
> If Position equals No_Element, the operation has no effect. Otherwise
> Delete removes Count nodes starting at the node designated by Position
> from Container (or all of the nodes if there are less than Count nodes
> starting at Position). Any exceptions raised during deallocation
> of internal storage are propagated.
>
> MJH:
> Is this inconsistent with vector? I think we made it an error if
> Size > 0 and Position = No_Element. (I don't know which way we should
> go, I just wanted to bring it up.)
> ENDMJH.
Yes, it seems to be inconsistent with Vector. Vector raises C_E for indexes
out of range (of course), and the cursor version mimics that behavior,
because it really can't do anything else. So I'd say this probably out to
raise C_E as well.
****************************************************************
From: Matthew Heaney
Sent: Thursday, June 10, 2004 10:32 AM
> It's very clear that this is a bounded error, and we're *not* requiring
> implementations to detect this case (in this specific example, because
> Delete is called on an element to the left). But we *allow* it to be
> detected. I thought we had agreed that we didn't want the overhead of
> detecting these kinds of errors.
OK, I just wanted to make sure.
The other thing I forget to mention is that the following operations are
in the list package but not the vector package:
procedure Delete (Container : in out List;
Item : in Element_Type);
generic
with function Predicate (Element : Element_Type)
return Boolean is <>;
procedure Generic_Delete (Container : in out List);
procedure Reverse_List (Container : in out List);
generic
with function Predicate (Element : Element_Type)
return Boolean is <>;
function Generic_Find (Container : List;
Position : Cursor := No_Element)
return Cursor;
generic
with function Predicate (Element : Element_Type)
return Boolean is <>;
function Generic_Reverse_Find (Container : List;
Position : Cursor := No_Element)
return Cursor;
There's no technical reason they should be in the list but not the
vector. Either we can add them to vector, or get rid of them for list.
Here's another idea. We already have a Generic_Update, but another
useful operation might be some kind of query operation, that either
returns Boolean or a type you pass in as a generic formal. Something like:
generic
type Result_Type (<>) is limited private;
function Process (E : ET) return Result_Type is <>;
function Generic_Query (Position : Cursor) return Result_Type;
Of course, a user could implement this as (here, for a Boolean Result_Type):
function Query (P : C) return Boolean is
Result : Boolean;
procedure Process (E : in out ET) is
begin
Result := Predicate (E); -- some algorithm
end;
procedure Update is new Generic_Update;
begin
Update (P);
return Result;
end;
The awkward case is when the Result_Type actual type is indefinite. For
example, were it type String you would have to use an unbounded_string
or whatever as the temporary (but maybe that's not such a big deal).
Clearly you can implement a query-style function from the update
modifier operation, but I wasn't sure whether that's possible in all
cases for all possible return types, and if so whether this warrants the
introduction of a dedicated operation.
****************************************************************
From: Randy Brukardt
Sent: Thursday, June 10, 2004 6:50 PM
...
> There's no technical reason they should be in the list but not the
> vector. Either we can add them to vector, or get rid of them for list.
I'd be wary of adding too many rarely used routines to these containers.
Those just make the containers harder to learn and harder to implement with
little additional benefit.
Unbounded_Strings has a large number of rarely used routines, and yet it
never seems to have the odd routine I actually need. So, that actually
increases the frustration level, because you'd think that in so many
routines, every plausible need would be met. When there are fewer routines,
the expectation level is lower, too, and you wouldn't feel quite so
ripped-off.
In the routines you mentioned, I think that the generic routines are too
specialized - it would be rare that you both could match their usage pattern
*and* would remember that they exist. Delete by item seems error-prone if
there are multiple identical items in the container (does it delete just one
or all of them? Explain your choice, and why the user would expect that over
the other possibility.) Reverse_List (which probably should just be called
"Reverse") doesn't seem that useful, and is masking a lot of work. So I'd
probably dump the whole lot. But I do agree that List and Vector should be
the same, whatever is decided.
> Here's another idea. We already have a Generic_Update, but another
> useful operation might be some kind of query operation, that either
> returns Boolean or a type you pass in as a generic formal.
> Something like:
>
> generic
> type Result_Type (<>) is limited private;
> function Process (E : ET) return Result_Type is <>;
> function Generic_Query (Position : Cursor) return Result_Type;
This seems too specialized to me. Most of the time, it would make just as
much sense to write a function of the Element. Besides, this seems like it
would be illegal if AI-318 is passed as currently planned, since limited
unconstrained types will not be allowed to be returned. So there is a
contract issue here (having a function that has to be able to both
build-in-place and return-by-copy seems like a very nasty case for generic
sharing implementations).
In any case, we need to avoid "feeping creaturism" here. KISS definitely
applies!
****************************************************************
From: Pascal Obry
Sent: Wednesday, June 9, 2004 10:43 AM
One feedback after migrating AWS to the AI302 reference implementation. The
procedure Size and Length are really too confusing. I have at least 2 times
used the wrong one (using Size instead of Length). Length is ok, maybe Size
should be renamed Hash_Size or something like that.
For the record:
function Size (Container : Vector) return Size_Type;
-> returns the size of the hash table (number of buckets)
function Length (Container : Vector) return Size_Type;
-> returns the number of item in the vector
Also, as Size and Resize are low-level stuff I would put those routines at the
end of the package. Another solution would be to put such routines into a
child package. Thoughts ?
****************************************************************
From: Matthew Heaney
Sent: Wednesday, June 9, 2004 2:55 PM
Pascal Obry wrote:
> One feedback after migrating AWS to the AI302 reference implementation. The
> procedure Size and Length are really too confusing. I have at least 2 times
> used the wrong one (using Size instead of Length). Length is ok, maybe Size
> should be renamed Hash_Size or something like that.
It's not unlike for an array, which has both 'Length and 'Size attributes.
> For the record:
>
> function Size (Container : Vector) return Size_Type;
> -> returns the size of the hash table (number of buckets)
No. The Size of a hashed map container specifies the maximum length
(number of items) before which automatic expansion of the internal hash
table occurs. It does *not* specify the number of buckets in the hash
table.
(It is indeed the case that in the AI-302 reference implementation,
function Size happens to return the number of hash table buckets, but
that is a characteristic of that particular implementation. It is not
guaranteed to be the case for all implementations.)
> function Length (Container : Vector) return Size_Type;
> -> returns the number of items in the vector
Technically it's the "number of active elements," but let's not quibble.
> Also, as Size and Resize are low-level stuff I would put those routines at the
> end of the package. Another solution would be to put such routines into a
> child package. Thoughts ?
It's a bad idea.
****************************************************************
From: Pascal Obry
Sent: Wednesday, June 9, 2004 3:26 PM
What is a bad idea ? I have proposed 3 things :
- rename Size and keep Length
- move the Size and Resize to the end of the API
- move the Size and Resize routines into a child package
I hope that you at least see that Size/Length having the same prototype
is dangerous. It is even more dangerous that using Size instead of Length
can stay undetected for some time...
****************************************************************
From: Matthew Heaney
Sent: Wednesday, June 9, 2004 3:44 PM
I was referring to the suggestion in your last paragraph to make Size
and Resize child subprograms.
****************************************************************
From: Pascal Obry
Sent: Wednesday, June 9, 2004 3:53 PM
Ok, I also think it is bad idea, was there for completeness :)
****************************************************************
From: Tucker Taft
Sent: Wednesday, June 9, 2004 3:34 PM
How about "Maximum_Length" and "Set_Maximum_Length" in place
of Size and Resize?
****************************************************************
From: Pascal Obry
Sent: Wednesday, June 9, 2004 3:42 PM
Fine with me.
****************************************************************
From: Robert A Duff
Sent: Wednesday, June 9, 2004 7:23 PM
> What is a bad idea ? I have proposed 3 things :
I don't know Matt's opinion, but here's mine:
> - rename Size and keep Length
Good idea. I think this is fairly important.
> - move the Size and Resize to the end of the API
Good idea. Not important.
> - move the Size and Resize routines into a child package
Bad idea.
> I hope that you at least see that Size/Length having the same prototype
> is dangerous. It is even more dangerous that using Size instead of Length
> can stay undetected for some time...
Yes, I agree. The name Size should be changed to something else,
something nobody would mistake for Length.
****************************************************************
From: Nick Roberts
Sent: Wednesday, June 9, 2004 9:06 PM
> How about "Maximum_Length" and "Set_Maximum_Length" in place
> of Size and Resize?
I endorse this suggestion. Specifically, I suggest:
(1) In package Ada.Containers, change:
type Size_Type is range 0 .. <implementation-defined>;
to:
type Count_Type is range 0 .. <implementation-defined>;
and all subsequent uses of Size_Type be renamed to Count_Type.
(2) In packages Ada.Containers.Vectors, Ada.Containers.Hashed_Maps, (and
Ada.Containers.Indefinite_Hashed_Maps,) change:
function Size (Container : Vector|Map) return Size_Type;
to:
function Maximum_Length (Container : Vector|Map) return Count_Type;
and change:
procedure Resize (Container : in out Vector|Map;
Size : in Size_Type);
to:
procedure Set_Maximum_Length (Container : in out Vector|Map;
To : in Count_Type);
(3) Change all references to the term 'size' to 'maximum length'. For
example, change the second paragraph of the proposed A.17.2 from:
A vector container object manages an unconstrained internal array, which
expands as necessary as items are inserted. The *size* of a vector
corresponds to the total length of the internal array, and the *length*
of a vector corresponds to the number of active elements in the internal
array.
to:
A vector container object conceptually manages an unconstrained internal
array, which expands as necessary as items are inserted. The *maximum
length* of a vector corresponds to the total length of this conceptual
internal array, and the *length* of a vector corresponds to the number
of active elements within this array.
An alternative to 'maximum length' and [Set_]Maximum_Length throughout all
the above could be 'allocated length' and [Set_]Allocated_Length.
This issue has been argued about before. Some said that the term 'size'
clashed with the predominant existing usage of the term in connection with
the number of storage units used up by objects and program units. Others
said that many terms are 'overloaded' in the RM, and the term 'size' is
already used to mean other things in some places.
However, I quite strongly feel that an alternative term could easily be
chosen, and it would be very desirable to do so, to avoid just the kind of
confusion Pascal reported.
I must also add that I still think it is unjustified that the size/maximum
length of a vector or map is not permitted to be reduced by any
implementation. Specifically, I advocate that Resize/Set_Maximum_Length be
allowed (by the standard) to reduce the size/maximum length of a vector or
map, but that implementations are permitted to ignore such reductions if
they wish. In fact, I would suggest that the current wording (forbidding
such reductions) is silly in a way, because I doubt very much that there
will
ever be an ACATS test for it. On that basis, I also question the wording
"Resize sets the size of Container to a value which is at least the value
Size", which could more sensibly be changed to "Resize sets the size of
Container to approximately the value Size".
(4) I suggest the paragraph:
If Size (Container) is equal to or greater than Size, the operation does
nothing. Otherwise Resize sets the size of Container to a value which is
at least the value Size, expanding the internal array to hold Size
elements. Expansion will require allocation, and possibly copying and
deallocation of elements. Any exceptions raised by these operations
are propagated, leaving the container with at least the original Size,
Length, and elements.
be changed to:
Set_Maximum_Length sets the maximum length of Container to approximately
the value To, expanding or contracting the internal array as required.
Expansion or contraction may require allocation, and possibly copying and
deallocation of elements. Any exceptions raised by these operations are
propagated, leaving the length and active elements of the container
unchanged.
and that the following AARM notes be changed appropriately, and that this
implementation permission is added:
Implementations are not required to support the [changing|reduction] of the
maximum size of a container by Set_Maximum_Length, in which case calls
of this procedure should do nothing.
I favour the word 'changing', on the basis that Set_Maximum_Length is
probably never going to be ACATS tested for its effect on the size (maximum
length) of a vector or map.
(4) I also suggest that the concept of an 'expansion factor' is added to
vectors and maps. Each vector or map has its own expansion
factor associated with it, which is a value of the subtype
Ada.Containers.Expansion_Factor_Type, declared as follows:
subtype Expansion_Factor_Type is Float range 1.0 .. [impl def];
Whenever a vector or map is expanded automatically, the value of its
expansion factor at the time may be used (but does not have to be) by the
implementation to determine the new maximum length of the container,
nominally by multiplying the current maximum length by the current expansion
factor.
The initial (default) value of the expansion factor of a container is
implementation defined, but its value may be retrieved and set by the
following subprograms:
function Expansion_Factor (Container : Vector|Map)
return Expansion_Factor_Type;
procedure Set_Expansion_Factor
(Container : in out Vector|Map;
To : in Expansion_Factor_Type);
****************************************************************
From: Robert A. Duff
Sent: Thursday, June 10, 2004 7:45 AM
Tuck says:
> How about "Maximum_Length" and "Set_Maximum_Length" in place
> of Size and Resize?
I don't really like "Maximum_Length", because there actually *is* no max
length -- the whole point is these things can grow arbitrarily large.
I believe STL calls them "capacity" and "reserve".
Pretty much anything would be better than "Size", for the reasons Pascal
stated.
****************************************************************
From: Matthew Heaney
Sent: Thursday, June 10, 2004 10:10 AM
Well, it does describe when expansion happens. How about:
function Expansion_Length
(Container : in Map) return Size_Type;
procedure Set_Expansion_Length
(Container : in out Map;
Length : in Size_Type);
****************************************************************
From: Alexander E. Kopilovich
Sent: Thursday, June 10, 2004 11:31 AM
Another proposition:
function Extent -- or Current_Extent
and
procedure Set_Extent -- correspondily, Set_Current_Extent
But perhaps the best would be to say straight:
function Reserved_Length -- or Reserved_Size
and
procedure Set_Reserved_Length -- correspondily, Set_Reserved_Size
****************************************************************
From: Tucker Taft
Sent: Thursday, June 10, 2004 1:42 PM
> But perhaps the best would be to say straight:
>
> function Reserved_Length
>
> and
>
> procedure Set_Reserved_Length
I like these. "Capacity" is pretty much a synonym for
"Maximum_Length". Both need the word "Current" added to
make it clear these are expandable.
"Reserved" has just the right connotation.
By the way, I agree that there seems no reason not to
allow Set_Reserved_Length to specify a smaller length,
though we then want Reserved_Length to be allowed to
return a value larger than the value most recently set
by Set_Reserved_Length. Which might argue for changing
the "set" procedure's name to "Set_Minimum_Reserved_Length"
and the function's name to "Actual_Reserved_Length" to
be crystal clear.
****************************************************************
From: Alexander E. Kopilovich
Sent: Thursday, June 10, 2004 5:13 PM
Perhaps for this purpose "Provide_Reserved_Length" would be even better
(again, more straigt) than "Set_Minimum_Reserved_Length".
And additionally, as "provide" (unlike "set") is somehow uncertain about
the upper limit, there will be less need for the prefix "Actual_" before
"Reserved_Length".
****************************************************************
From: Nick Roberts
Sent: Friday, June 11, 2004 9:13 PM
I like these suggestions. I quite like 'Actual_Reserved_Length', but I think
it's not really necessary, since there is no other function
('Requested_Reserved_Length' or some such) for it to be contrasted with.
Perhaps a consensus is coming close to:
- rename the term 'size' as 'reserved length';
- rename the 'Size' functions as 'Reserved_Length';
- rename the 'Resize' procedures as 'Request_Reserved_Length'.
I would also like to suggest:
- rename the type 'Size_Type' as 'Count' or 'Count_Type'.
My justification for this is that the term 'size' is mainly used in
connection with storage units, so some potential for confusion would be
easily avoided by a different name, and a type 'Count' fulfilling a very
similar role is declared in the Ada.*_IO packages.
I would like to reiterate my suggestions that:
- the Request_Reserved_Length (Resize) procedures are permitted to reduce
the reserved length (size) of a container, but that in any case (reduction
or expansion) any actual change to the reserved length (size) remains
implementation defined;
- an explicit expansion factor is supported, as in my previous post.
My justification for the first is that it would not be sensible to formally
test whether an implementation obeyed a more stringent definition. The
Reserved_Length (Size) functions should return the actual reserved length
(size), but again, it would probably not be sensible to try to formally test
this (and it may not be possible).
My justification for the second is that there will often be situations where
the user (of the proposed container packages) knows better than the
implementation what the expansion factor should be, and in such cases the
implementation default for deciding by how much to expand a container
(whether by a simple factor or some other method) is likely to be very
inappropriate. [Sorry about numbering this point '(4)' in my previous post;
it should have been '(5)'.]
****************************************************************
From: Michael F. Yoder
Sent: Saturday, June 12, 2004 9:18 AM
>Perhaps a consensus is coming close to:
>
>- rename the term 'size' as 'reserved length';
>
>- rename the 'Size' functions as 'Reserved_Length';
>
>- rename the 'Resize' procedures as 'Request_Reserved_Length'.
If there is such a consensus, I'll add my support to it. These seem
like good ideas.
>I would also like to suggest:
>
>- rename the type 'Size_Type' as 'Count' or 'Count_Type'.
>
>My justification for this is that the term 'size' is mainly used in
>connection with storage units, so some potential for confusion would be
>easily avoided by a different name, and a type 'Count' fulfilling a very
>similar role is declared in the Ada.*_IO packages.
I agree.
>I would like to reiterate my suggestions that:
>
>- the Request_Reserved_Length (Resize) procedures are permitted to reduce
>the reserved length (size) of a container, but that in any case (reduction
>or expansion) any actual change to the reserved length (size) remains
>implementation defined;
I strongly agree. Requiring that the user write the size reduction code
via a copy forecloses even the possibility of a reduction that avoids
copying.
>- an explicit expansion factor is supported, as in my previous post.
>
>My justification for the first is that it would not be sensible to formally
>test whether an implementation obeyed a more stringent definition. The
>Reserved_Length (Size) functions should return the actual reserved length
>(size), but again, it would probably not be sensible to try to formally test
>this (and it may not be possible).
>
>My justification for the second is that there will often be situations where
>the user (of the proposed container packages) knows better than the
>implementation what the expansion factor should be, and in such cases the
>implementation default for deciding by how much to expand a container
>(whether by a simple factor or some other method) is likely to be very
>inappropriate. [Sorry about numbering this point '(4)' in my previous post;
>it should have been '(5)'.
I'm less enthusiastic about the expansion factor, but I don't oppose it.
****************************************************************
From: Robert A. Duff
Sent: Monday, June 14, 2004 8:55 AM
Mike Yoder wrote:
> Nick Roberts wrote:
>
> >- the Request_Reserved_Length (Resize) procedures are permitted to reduce
> >the reserved length (size) of a container, but that in any case (reduction
> >or expansion) any actual change to the reserved length (size) remains
> >implementation defined;
> >
> I strongly agree. Requiring that the user write the size reduction code
> via a copy forecloses even the possibility of a reduction that avoids
> copying.
The STL guarantees that the reserved size is at least that requested.
This is important because it means that cursors/iterators that
point into the data structure do not become invalid while
appending (up to that reserved size).
****************************************************************
From: Nick Roberts
Sent: Sunday, June 20, 2004 1:53 PM
However, the semantics required by the current AI-302 is clearly different.
The relevant wording is:
A Cursor value is *ambiguous* if any of the following have occurred
since it was created:
* Insert or Delete has been called on the vector that contains the
element the cursor designates with an index value (or a cursor
designating an element at such an index value) less than or equal
to the index value of the element designated by the cursor;
* The vector that contains the element it designates has been
passed to an instance of Generic_Sort.
and:
A Cursor value is *invalid* if any of the following have occurred
since it was created:
* The vector that contains the element it designates has been
finalized;
* The vector that contains the element it designates has been
used as the Source or Target of a call to Move;
* The element it designates has been deleted.
The result of "=" or Has_Element is unspecified if it is called with
an invalid cursor parameter. Execution is erroneous if any other
subprogram declared in Containers.Vectors is called with an
invalid cursor parameter, or if the cursor designates an element in
a different vector object than the appropriate one specified in the
call.
AARM Notes:
The list above (combined with the bounded error cases) is
intended to be exhaustive. In other cases, a cursor value
continues to designate its original element. For instance,
cursor values survive the appending of new elements.
End AARM Notes.
Cursors are not permitted to become ambiguous or invalid solely because of
internal copying (as a result of automatic extension).
****************************************************************
From: Randy Brukardt
Sent: Wednesday, June 23, 2004 9:43 PM
Right. That's an important property: cursors do not become invalid because
of an action that is outside of the user's control. And memory management in
a container is outside of the user's control.
Resize (I forget the new name we settled on, so I'll use the old one for
now) is purely a performance enhancing routine. The only requirement is that
Size (ditto on the name) returns the value most recently passed into Resize,
or something larger. There's an AARM note suggesting to implementors that
Resize allocate at least the specified memory, but of course that is
untestable and cannot be specified in normative language of the standard.
****************************************************************
From: Simon Wright
Sent: Friday, June 11, 2004 3:03 AM
> possibility.) Reverse_List (which probably should just be called
> "Reverse")
If it wasn't a reserved word!
****************************************************************
From: Matthew Heaney
Sent: Sunday, June 27, 2004 5:35 PM
Randy:
I have a few comments about the Palma API release. The actual text of my
comments are bracketed by "MJH:" and "ENDMJH." pairs, and immediately
follows the operation(s) to which they refer.
vector:
MJH:
The partial view of type Vector is now tagged, like this:
type Vector is tagged private;
Ditto for the other containers.
ENDMJH.
function To_Vector (Count : Count_Type) return Vector;
function To_Vector (New_Item : Element_Type;
Count : Count_Type)
return Vector;
MJH:
We need to affirm whether the parameter should be named "Count" or "Length".
ENDMJH.
function Capacity (Container : Vector) return Count_Type;
procedure Set_Capacity (Container : in out Vector;
Capacity : in Count_Type);
MJH:
I declared the operations formerly named "Size" and "Resize" as above.
ENDMJH.
generic
with procedure Process (Element : in out Element_Type);
procedure Generic_Update_Element_By_Index (Container : in Vector;
Index : in
Index_Type'Base);
generic
with procedure Process (Element : in out Element_Type);
procedure Generic_Update_Element (Position : in Cursor);
MJH:
We don't need different names for these operations anymore, since they're
not generic and hence we can overload the names as follows (verify my syntax
is correct):
procedure Update_Element (Container : in Vector;
Index : in Index_Type'Base;
Process : access procedure (Element : in out
Element_Type));
procedure Update_Element (Position : in Cursor;
Process : access procedure (Element : in out
Element_Type));
ENDMJH.
procedure Set_Length (Container : in out Vector;
Length : in Size_Type);
MJH:
Is this vector operation missing?
procedure Set_Length (Container : in out Container_Type;
Length : in Size_Type;
New_Item : in Element_Type);
This would allow you to specify a value for elements that become active when
Length > Length (Container).
ENDMJH.
procedure Swap (Container : in Vector;
I, J : in Index_Type'Base);
procedure Swap (Container : in out Vector;
I, J : in Cursor);
MJH:
The declaration of the second Swap operation (the one for which I and J have
type Cursor) appears to be incorrect. Firstly, the Container parameter is
inout. (This was probably for symmetry with the cursor-based Swap for the
List container -- see below.) It would only need to be in-mode. However, I
think the real error is that there is a container parameter at all. It is
never that case that you need to pass a container when all you're doing is
manipulating an element through a cursor (e.g. E := Element (C)).
I think the cursor-based swap operation should be declared this way:
procedure Swap (I, J : in Cursor);
The comment also applies to the (cursor-based) swap operation for the List
containers.
(This change is really a consequence of clarifying the semantics of swap for
list containers during the ARG meeting in Palma.)
ENDMJH.
MJH:
Should be weaken the precondition, allowing I and J to have the value
No_Element, in which case Swap is a no-op?
ENDMJH.
function Is_In (Item : Element_Type;
Container : Vector)
return Boolean;
MJH:
As a consequence of making the vector type tagged, the parameters should be
put in the opposite order, like this:
function Is_In (Container : Vector;
Item : Element_Type)
return Boolean;
ENDMJH.
generic
with procedure Process (Position : in Cursor);
procedure Generic_Iteration (Container : in Vector);
generic
with procedure Process (Position : in Cursor);
procedure Generic_Reverse_Iteration (Container : in Vector);
MJH:
These operations aren't generic anymore. Also, the name should probably be
changed to use verb-style instead of noun-style (the existing name is
consistent with the style for generic operations as Unchecked_Deallocation,
etc):
procedure Iterate (Container : in Vector;
Process : access procedure (Position : in Cursor));
procedure Reverse_Iterate (Container : in Vector;
Process : access procedure (Position : in
Cursor));
ENDMJH.
list:
procedure Swap (Container : in out List;
I, J : in Cursor);
MJH:
The semantics of Swap were clarified in Palma so that only elements are
swapped, and not nodes. Hence there is no need for a container parameter,
and the swap operation should be declared like this:
procedure Swap (I, J : in Cursor);
This cursor-based swap operation for Vector should be declared similarly.
ENDMJH.
MJH:
As for the Vector, the parameters for Is_In should be reordered so that the
container parameter is the first parameter.
ENDMJH.
map:
generic
type Key_Type is private;
type Element_Type is private;
with function Hash (Key : Key_Type)
return Hash_Type is <>;
with function Is_Equal_Key (Left, Right : Key_Type)
return Boolean is "=";
with function "=" (Left, Right : Element_Type)
return Boolean is <>;
package AI302.Containers.Hashed_Maps is ...;
MJH:
There was a lot of talk between Tucker and Pascal about the declaration of
generic formal region for the hashed map. I think Tucker wanted it to look
like this:
generic
type Key_Type is private;
type Element_Type is private;
with function Hash (Key : Key_Type)
return Hash_Type; --VERIFY WHETHER THERE'S NO DEFAULT
with function "=" (Left, Right : Key_Type)
return Boolean is <>;
with function Equivalent (Left, Right : Key_Type)
return Boolean is "=";
with function "=" (Left, Right : Element_Type)
return Boolean is <>;
package AI302.Containers.Hashed_Maps is ...;
We agreed that the map container would use keys to compute map container
equality. My question is exactly how this should be done. Firstly, what is
the purpose for passing in key equality as a generic formal parameter? Is
it merely to supply a default for Equivalent? Or is it also used for some
other purpose (perhaps to compute map equality)?
To compute map equality, we do something like this:
(1) Compare lengths; if they're different, then return false.
(2a) For each key in the left (say) map, see if it's in the right map.
If it's not found, then return false.
(2b) If the key is found, then compare the elements. If they're not
equal, then return false.
My question is really about step (2a), about what it means to "compare
keys." When we check to see if the key of the left map is in the right map,
we do what already during insertion and deletion, by computing the hash
value and then calling Equivalent (formerly called "Is_Equal_Key"). So what
is the purpose of key equailty "="? Do we use key equality for some purpose
other than providing a default for Equivalent? Or do we somehow incorporate
an explicit call to key "=" when we "compare keys" during computation of map
equality?
ENDMJH.
MJH:
Another point: the name "Equivalent" is also inconsistent with cursor
operations named "Is_Equal_Key". Was this intended? Should we leave the
formal operation named "Is_Equal_Key" as is, or change the cursor operations
to use name the "Equivalent"?
ENDMJH.
procedure Insert (Container : in out Map;
Key : in Key_Type;
New_Item : in Element_Type;
Position : out Cursor;
Success : out Boolean);
procedure Replace (Container : in out Map;
Key : in Key_Type;
New_Item : in Element_Type);
procedure Insert (Container : in out Map;
Key : in Key_Type;
Position : out Cursor;
Success : out Boolean);
MJH:
We overloaded the insertion operations for list to include overloadings that
omit a cursor parameter. Should we provide similar overloadings for maps
(and sets -- see below) too? Something like this:
procedure Insert (Container : in out Map;
Key : in Key_Type;
New_Item : in Element_Type);
(I have omitted an overloading that just accepts a key, since in general we
need a cursor in order to give the element a value following the insertion
proper.)
ENDMJH.
sets:
procedure Insert (Container : in out Set;
New_Item : in Element_Type;
Position : out Cursor;
Success : out Boolean);
MJH:
The following overloading that omits the cursor parameter would be useful:
procedure Insert (Container : in out Set;
New_Item : in Element_Type);
I've had a need for this operation, as has Georg Bauhaus (per CLA). It
would also be consistent with list, which has an overloading that omits the
cursor parameter.
ENDMJH.
MJH:
We have a Replace operation for maps, but nothing similar for sets. It make
make sense to include this set operation too:
procedure Replace (Container : in out Set;
New_Item : in Element_Type);
ENDMJH.
function Is_Subset (Item : Set;
Container : Set)
return Boolean;
function Is_Disjoint (Item : Set;
Container : Set)
return Boolean;
function Is_In (Item : Element_Type;
Container : Set)
return Boolean;
MJH:
Since the container type is tagged, all of these operations need to reorder
the parameters so that the container is first:
function Is_Subset (Container : Set;
Item : Set)
return Boolean;
function Is_Disjoint (Container : Set;
Item : Set)
return Boolean;
function Is_In (Container : Set;
Item : Element_Type)
return Boolean;
ENDMJH.
MJH:
For both Is_Subset and Is_Disjoint, we should clarify the results when one
or both of the params are empty sets.
ENDMJH.
function Find (Container : Set;
Item : Element_Type)
return Cursor;
MJH:
Immediately following the declaration of the Find operation, the following
two operations are declared:
function Ceiling (Container : Set;
Item : Element_Type)
return Cursor;
function Floor (Container : Set;
Item : Element_Type)
return Cursor;
ENDMJH.
****************************************************************
From: Matthew Heaney
Sent: Sunday, June 27, 2004 6:34 PM
> MJH:
> For both Is_Subset and Is_Disjoint, we should clarify the
> results when one or both of the params are empty sets. ENDMJH.
>
> function Find (Container : Set;
> Item : Element_Type)
> return Cursor;
>
> MJH:
> Immediately following the declaration of the Find operation,
> the following two operations are declared:
>
> function Ceiling (Container : Set;
> Item : Element_Type)
> return Cursor;
>
> function Floor (Container : Set;
> Item : Element_Type)
> return Cursor;
> ENDMJH.
MJH:
I forget to mention here that Ceiling and Floor should also be added to the
set nested package Generic_Keys:
function Is_In (Container : Set;
Key : Key_Type)
return Boolean;
function Find (Container : Set;
Key : Key_Type)
return Cursor;
function Ceiling (Container : Set;
Key : Key_Type)
return Cursor;
function Floor (Container : Set;
Key : Key_Type)
return Cursor;
ENDMJH.
MJH:
Note also that the generic operation Generic_Keys.Generic_Insertion has been
removed.
ENDMJH.
****************************************************************
From: Nick Roberts
Sent: Sunday, June 27, 2004 5:50 PM
[My comments are between NJR and ENDNJR.]
"Matthew Heaney" <matthewjheaney@earthlink.net> wrote:
function To_Vector (Count : Count_Type) return Vector;
function To_Vector (New_Item : Element_Type;
Count : Count_Type)
return Vector;
MJH:
We need to affirm whether the parameter should be named "Count" or "Length".
ENDMJH.
NJR:
The name 'Length' seems very approrpiate to me.
ENDNJR
MJH:
Is this vector operation missing?
procedure Set_Length (Container : in out Container_Type;
Length : in Size_Type;
New_Item : in Element_Type);
This would allow you to specify a value for elements that become active when
Length > Length (Container).
ENDMJH.
NJR
I think this procedure might make sense (but it might be considered
overkill). I guess 'Count_Type' was meant instead of 'Size_Type'.
ENDNJR
procedure Swap (I, J : in Cursor);
NJR
Might it be slightly clearer for the name to be 'Swap_Elements'?
ENDNJR
MJH:
Should be [we] weaken the precondition, allowing I and J to have the value
No_Element, in which case Swap is a no-op?
ENDMJH.
NJR
All the algorithms that I can think of which swap elements in an array (of
some kind) necessarily have a pre-test for validity before doing the swap. I
therefore think it would be a useful bug-catcher to specify that an
exception is raised if I or J is No_Element. By analogy to an Ada array,
doing T := A(I); A(I) := A(J); A(J) := T; would raise Constraint_Error if I
or J were out of the range of A.
ENDNJR
function Is_In (Item : Element_Type;
Container : Vector)
return Boolean;
MJH:
As a consequence of making the vector type tagged, the parameters should be
put in the opposite order, like this:
function Is_In (Container : Vector;
Item : Element_Type)
return Boolean;
ENDMJH.
NJR
The obvious objection to this is that it would lack consistency with the
Ada.Character.Maps packages. But I think I tentatively agree with Matt, in
which case the name should probably be changed to something like 'Contains',
'Includes', or 'Has'.
ENDNJR
procedure Swap (I, J : in Cursor);
NJR
Again, maybe 'Swap_Elements' would be a slightly clearer name.
ENDNJR
MJH:
As for the Vector, the parameters for Is_In [for lists] should be reordered
so
that the container parameter is the first parameter.
ENDMJH.
NJR
Again, in which case the name should be something like 'Contains',
'Includes', or 'Has'.
ENDNJR
MJH:
Since the container type is tagged, all of these operations need to reorder
the parameters so that the container is first:
function Is_Subset (Container : Set;
Item : Set)
return Boolean;
function Is_Disjoint (Container : Set;
Item : Set)
return Boolean;
function Is_In (Container : Set;
Item : Element_Type)
return Boolean;
ENDMJH.
NJR
Again, the objection is that there would be a lack of consistency with
Ada.Strings.Maps. If the order of the parameters were to be changed, it
would seem that alternative names ought to be chosen for Is_Subset and
Is_In. For example:
function Contains_All (Container : Set;
Item : Set)
return Boolean;
function Contains (Container : Set;
Item : Element_Type)
return Boolean;
ENDNJR
****************************************************************
From: Matthew Heaney
Sent: Sunday, June 27, 2004 10:45 PM
> MJH:
> Is this vector operation missing?
>
> procedure Set_Length (Container : in out Container_Type;
> Length : in Size_Type;
> New_Item : in Element_Type);
>
> This would allow you to specify a value for elements that
> become active when Length > Length (Container). ENDMJH.
>
> NJR
> I think this procedure might make sense (but it might be
> considered overkill). I guess 'Count_Type' was meant instead
> of 'Size_Type'.
> ENDNJR.
Yes, that was a cut and paste error. The Length parameter should have type
Count_Type. (Size_Type is gone.)
****************************************************************
From: Matthew Heaney
Sent: Sunday, June 27, 2004 10:52 PM
> map:
>
> package AI302.Containers.Hashed_Maps is ...;
I forgot to mention the changes to these map two operations:
function Size (Container : Map) return Size_Type;
procedure Resize (Container : in out Map;
Size : in Size_Type);
MJH:
I made the name changes here the same as for the vector:
function Capacity (Container : Map) return Count_Type;
procedure Set_Capacity (Container : in out Map;
Capacity : in Count_Type);
ENDMJH.
****************************************************************
From: Pascal Leroy
Sent: Monday, June 28, 2004 2:17 AM
Matt wrote:
> function Is_In (Item : Element_Type;
> Container : Vector)
> return Boolean;
>
> MJH:
> As a consequence of making the vector type tagged, the
> parameters should be put in the opposite order, like this:
>
> function Is_In (Container : Vector;
> Item : Element_Type)
> return Boolean;
> ENDMJH.
I disagree. The reason why we care which parameter comes first is of
course the Object.Operation notation introduced by AI 252. However, in
this case we want to actively prevent the use of this notation. With your
proposed change a call to Is_In could be written:
My_Vector.Is_In (My_Element)
which reads exactly backwards. This operation is a case where the
parameters have a "natural" order and we don't want to change it. (I wish
we could redefine "in", but that's a different topic.)
****************************************************************
From: Cyrille Comar
Sent: Monday, June 28, 2004 3:38 AM
Maybe "Is_In" should be renamed "Contains" which has the opposite
"natural" order:
My_Vector.Contains (My_Element)
looks better...
****************************************************************
From: Pascal Leroy
Sent: Monday, June 28, 2004 4:28 AM
I like the idea. I have never been very happy with the name Is_In anyway.
****************************************************************
From: Matthaw Heaney
Sent: Monday, June 28, 2004 8:54 AM
As Nick pointed out, the name Is_In comes from Ada.Strings.Maps (RM95
A.4.2 (13)). (That package also has an Is_Subset operation, with
parameters in the same order as Is_In.)
So I guess it's a choice between consistency with other parts of RM95,
or trying to take advantage of new syntax allowed by Ada 0Y.
****************************************************************
From: Georg Bauhaus
Sent: Wednesday, June 30, 2004 7:25 AM
!topic unchecked Insert for Sets and Maps
!reference RM95-A.17 [AI95-00302-03]
!from Author Georg Bauhaus 04-06-30
!discussion
There are similar Insert procedures for both Ordered_Sets
and Hashed_Maps with highly useful Position and Success
parameters. Sometimes however, it seems somewhat
disturbing to see declararations of variables for Position
and Success that are not read because it is considered
safe to ignore them. It might be known that Insert will
succeed without surprises (ceteris paribus). Examples
include adding initial values to a library level
container, like 69 keywords, or adding known border values
to ordered containers.
A work around is to have wrapper procedures providing
variables necessary for Insert. But does this not incur
more verbiage and/or withing than is desirable? Adding a
convenient procedure to the containers seems easy. In a
sense, this might also make Ordered_Sets and Hashed_Maps
correspond more closely to Vectors and Doubly_Linkes_List
with regard to Insert procedures.
(When using maps, I sometimes think of them as sparse
arrays. With arrays, I can just write
ary(key) := value;
and be done.)
****************************************************************
From: Matthew Heaney
Sent: Wednesday, June 30, 2004 3:07 PM
You have that operation already; it's called Replace:
Replace (Map, Key, New_Item => Value);
However, there's nothing like that for the ordered sets. It would
appear useful as an adjunct to Update_Element.
I agree that having overloadings of Insert for sets and maps that omit
the Position and Success parameters would be handy. It's often the case
that you know an insertion will succeed, so having to declare a Boolean
object that you don't bother inspecting is kind of a pain.
****************************************************************
From: Pascal Leroy
Sent: Wednesday, June 30, 2004 3:30 PM
These are sensible suggestions. However, I have to remind you that if
this AI is not approved at the Madison ARG meeting it won't be in the
Amendment. You should be cautious when considering the addition of new
features: at this point we should really be crossing the t's and dotting
the i's.
This is especially important given that some countries expressed concerns
regarding the maturity of this particular AI at the WG9 level.
****************************************************************
From: Marius Amado Alves
Sent: Thursday, July 1, 2004 5:43 AM
I think Matthew is 'more right' than Jeffrey. But please note I tend to
NOT advise changing the spec now, for the reasons Pascal gave. So this
is mainly academic now, and sorry for being slightly OT, but I love real
code examples. Two follow, from Mneson.Base (AI302 version 20040227). In
(1) I know in advance that the element is new. In (2) I want the set
semantics (unique elements) guaranteed by the container, so I don't care
about the success value. You'll note I tried hard to name the dummy
variables in accordance with the circumstances.
(1)
-- ...
use String_Maps;
use Short_String_IO;
Found, Dont_Need : String_Maps.Cursor_Type;
Expected_True : Boolean;
begin
Found := Find (String_Table, Value);
if Found /= Null_Cursor then
X := Element (Found);
else -- new string
-- code here to store the value
-- in a Short_String_IO file
Insert
(Map => String_Table,
Key => Value,
New_Item => X,
Cursor => Dont_Need,
Success => Expected_True);
end if;
-- ...
(2)
procedure Connect (Source, Target : Vertex) is
use Link_Sets;
Dont_Need : Cursor_Type;
Dont_Care : Boolean;
begin
Insert
(Set => Links,
New_Item => (Source, Target),
Cursor => Dont_Need,
Success => Dont_Care);
Insert
(Set => Inv_Links,
New_Item => (Target, Source),
Cursor => Dont_Need,
Success => Dont_Care);
end;
****************************************************************
From: Matthew Heaney
Sent: Thursday, July 1, 2004 9:31 AM
Note that (1) isn't the most efficient way to do this, since Insert
duplicates the effort of Find. I recommend doing it this way instead:
C : Cursor;
Not_Already_In_Map : Boolean;
begin
Insert (String_Table, Value, C, Not_Already_In_Map);
if Not_Already_In_Map then
Replace_Element (C, By => X);
else
X := Element (C);
end if;
end;
Here, we use the item-less version of Insert to perform an insertion
attempt, and then give the item a proper value if the key was actually
inserted during this attempt.
If the attempt fails, it's because the key was already in the map, so
you can then interrogate the item associated with that key.
****************************************************************
From: Marius Amado Alves
Sent: Thursday, July 1, 2004 10:40 AM
Yes I think it applies. Thanks. This kind of optimizations are in the
Mneson to do list. Everyone is welcome to join Mneson development ;-)
****************************************************************
From: Jeffrey Carter
Sent: Wednesday, June 30, 2004 7:59 PM
These are no doubt useful operations. However, as any student of
defensive programming knows, operations that you know will succeed
don't, often enough to be a problem. Since these components must ensure
their internal consistency whenever possible, these operations would
still have to check that they succeed, and raise an appropriate
exception if they don't.
****************************************************************
From: Matthew Heaney
Sent: Thursday, July 1, 2004 9:32 AM
> These are no doubt useful operations. However, as any student of
> defensive programming knows, operations that you know will succeed
> don't, often enough to be a problem. Since these components must ensure
> their internal consistency whenever possible, these operations would
> still have to check that they succeed, and raise an appropriate
> exception if they don't.
There may be some misunderstanding here. The Success parameter merely
indicates whether the key was inserted during *this* insertion, not
whether the key was inserted into the container. If Success returns
False, this simply means that the key was already in the container.
So no matter what Success returns, you still have a guarantee that the
key is in the map.
If this is a map, and it's important that the element associated with
the key is always stored in the map (even if the key is already in the
map), then use Replace.
****************************************************************
From: Jeffrey Carter
Sent: Thursday, July 1, 2004 7:54 PM
This is an un-Ada-like way to do things, and I'm sorry I didn't realize
this earlier. Insert might perform a replacement if the key already
exists, or it might consider it an error and raise an exception, but to
return a Boolean flag is too C-like for my tastes.
****************************************************************
From: Pascal Leroy
Sent: Friday, July 2, 2004 2:33 AM
I am starting to feel that way too, and I wish I had noticed this earlier.
The notion of an out parameter that you can drop on the floor if you like
looks like a real safety issue to me. It is all too easy to forget to
test this parameter. I know that Matt's philosophy is to trust the
programmer, but he has been repeatedly chided by the ARG for this.
My preference would be to change the specification of Insert as follows:
procedure Insert (Container : in out Map;
Key : in Key_Type;
New_Item : in Element_Type;
Allow_Replacement : in Boolean;
Position : out Cursor);
If Allow_Replacement is True, Insert will replace any existing entry in
the map with the given key/element pair. If Allow_Replacement is False,
Insert will raise an exception if the key is already in the map. In the
absence of an exception, Position will denote the newly inserted/replaced
entry.
I realize that I advocated avoiding changes to the specification, but this
AI is going to be shot down by WG9 if it contains safety holes.
****************************************************************
From: Marius Amado Alves
Sent: Friday, July 2, 2004 5:33 AM
I don't see a safety hole here, just a different style of doing the same
thing. Remember the "Success" parameter does not reflect some kind of
'total' success of the operation, just that the element was already
there. Other, really abnormal, conditions raise exceptions as expected.
Personally I'm even OK with the occasional Dont_Care dummy variable. But
the solution to this is trivial, just add the operation variants without
the out parameters as 'proxies' with the expected implementation e.g.
procedure Insert (Container, Item) is
Dont_Care : Boolean;
Dont_Need : Cursor;
begin
Insert (Container, Item, Dont_Care, Dont_Need);
end;
****************************************************************
From: Matthew Heaney
Sent: Friday, July 2, 2004 8:03 AM
> The notion of an out parameter that you can drop on the floor
> if you like looks like a real safety issue to me.
There are no safety issues.
If I have a set of integers, and I do this:
Insert (S, 42, C, B);
Insert (S, 42, C, B);
Then in the first call, B returns True, and in the second case, B returns
False. There are no errors.
What people have been asking for is the ability to say this:
Insert (S, 42);
Insert (S, 42);
Which everybody seems to agree is perfectly reasonable.
> It is all too easy to forget to test this parameter.
The Boolean parameter indicates whether the key was already in the
container. There are many times for which there is no reason to test the
return value, and that is why people are asking for an overloading of insert
that doesn't have the extra parameters.
> I know that
> Matt's philosophy is to trust the programmer, but he has been
> repeatedly chided by the ARG for this.
True, but there is no safety issue here.
> My preference would be to change the specification of Insert
> as follows:
>
> procedure Insert (Container : in out Map;
> Key : in Key_Type;
> New_Item : in Element_Type;
> Allow_Replacement : in Boolean;
> Position : out Cursor);
>
> If Allow_Replacement is True, Insert will replace any
> existing entry in the map with the given key/element pair.
We have a Replace operation already for maps that has that semantics. (I
was toying with the idea that it might be nice to have a Replace for sets
too.)
> If Allow_Replacement is False, Insert will raise an exception
> if the key is already in the map. In the absence of an
> exception, Position will denote the newly inserted/replaced entry.
The Allow_Replacement parameter is an example of "control coupling." If you
want replacement behavior, just call Replace!
> I realize that I advocated avoiding changes to the
> specification, but this AI is going to be shot down by WG9 if
> it contains safety holes.
There are no safety holes. The issue we had in Palma was with
Update_Element for sets (it would be possible to change the order relation
of the element), and Tucker suggested a change to remove any possible
erroneous behavior.
There is nothing wrong with Insert, except for the fact that we didn't
overload Insert to omit the position and status parameters. Mario gave some
examples of when that operation would be useful.
****************************************************************
From: Pascal Leroy
Sent: Friday, July 2, 2004 8:15 AM
> I don't see a safety hole here, just a different style of
> doing the same thing. Remember the "Success" parameter does
> not reflect some kind of 'total' success of the operation,
> just that the element was already there. Other, really
> abnormal, conditions raise exceptions as expected.
After calling Insert with some key/element pair, if Success is set to
False, the key/element pair is not really in the map. Instead, a pair
key/some-other-element is in the map. I see this as a violation of an
invariant of the map.
> But the solution to this is trivial, just add the
> operation variants without the out parameters as 'proxies'
This is trivial from the perspective of the implementers or of the
language description. It is _not_ trivial from the perspective of the
user of the container. More operations make the container more
complicated to use, you have to go back to the documentation to find out
the meaning of all these operations, and at the end of the day you are
less likely to use the container. The string packages are like that at
the moment: they contain so much stuff that I can never remember what's
there and what's not, so quite often I end up not using them.
****************************************************************
From: Robert A. Duff
Sent: Friday, July 2, 2004 8:37 AM
> procedure Insert (Container : in out Map;
> Key : in Key_Type;
> New_Item : in Element_Type;
> Allow_Replacement : in Boolean;
> Position : out Cursor);
The way I did this in my container packages is to have two routines:
one raises an exception if the key is not there, and the other
replaces with a new key=>value pair. You could call them
Insert and Replace.
I don't see the point of a routine that leaves the key associated with
the *old* value.
****************************************************************
From: Robert A. Duff
Sent: Friday, July 2, 2004 8:46 AM
> The way I did this in my container packages is to have two routines:
> one raises an exception if the key is not there, and the other
^^^^^^^^^
Oops! I meant it raises if the key *is* already there.
Sorry.
> replaces with a new key=>value pair. You could call them
> Insert and Replace.
Roughly the same thing works for mappings and for sets.
> I don't see the point of a routine that leaves the key associated with
> the *old* value.
****************************************************************
From: Marc Criley
Sent: Friday, July 2, 2004 9:07 AM
> This is trivial from the perspective of the implementers or of the
> language description. It is _not_ trivial from the perspective of the
> user of the container. More operations make the container more
> complicated to use, you have to go back to the documentation to find
out
> the meaning of all these operations, and at the end of the day you are
> less likely to use the container. The string packages are like that
at
> the moment: they contain so much stuff that I can never remember
what's
> there and what's not, so quite often I end up not using them.
The abundance of operations has certainly not hindered the adoption of
the C++ STL or the JDK libraries, whose content and complexity far
exceed that of the existing and proposed Ada libraries. And this
despite those collections often having their functionality supplied not
just by a single class, but a whole inheritance hierarchy of classes.
Whether it's strings or containers, I expect myself and other
programmers to have a general familiarity with the services available,
and then to look at the docs (whether in a separate document or embedded
as comments associated with the declaration--my preference) for the
details. My response to being uncertain about the contents of a library
is not to forgo its use, but to go scan through it to see what's
available so I can determine if it's useful to me and then take
advantage of it if it is!
****************************************************************
From: Matthew Heaney
Sent: Friday, July 2, 2004 9:31 AM
> After calling Insert with some key/element pair, if Success is set to
> False, the key/element pair is not really in the map. Instead, a pair
> key/some-other-element is in the map. I see this as a violation of an
> invariant of the map.
First of all, it is *not* a violation of any map invariant, and the
behavior you describe is often exactly what we want.
I gave an example yesterday, when I re-wrote Mario's example.
A histogram is another example (see the !examples of the AI):
Frequency_Histogram : Word_Count_Histograms.Map;
...
procedure Log_Word (Word : in String) is
C : Cursor;
B : Boolean;
begin
Frequency_Histogram.Insert
(Key => Word,
New_Item => 0, --YES
Position => C,
Success => B);
declare
procedure Increment (Count : in out Integer) is
begin
Count := Count + 1;
end;
begin
Update_Element (C, Increment'Access);
end;
end Log_Word;
This example illustrates why Pascal is wrong. In the example, we
attempt to insert the key value Word and the element value 0. This
locution is quite deliberate.
If the word is already in the map, then this insertion returns False
without touching the word count. We then increment the existing count,
which is exactly what we want.
If the word is not already in the map, then this insertion returns True
and the value 0 is associated with that key. We then increment the
count, which gives it the value 1, which is exactly what we want.
Notice that in neither case was it necessary to interrogate the Boolean
return value. It can return True or False, but either value is correct.
The value returned simply reflects the state of the map for this
insertion, but this is state information we don't care about.
****************************************************************
From: Matthew Heaney
Sent: Friday, July 2, 2004 9:52 AM
> I don't see the point of a routine that leaves the key associated with
> the *old* value.
See my last post containing the word count histogram for an example of
why you'd want to preserve the old value.
The only change we need to make to the API here is to add an overloading
for Insert that omits the position and success parameters.
At a minimum this overloading should be added to the sets. (The map has
a Replace operation, so we can probably leave the map alone.)
****************************************************************
From: Nick Roberts
Sent: Friday, July 2, 2004 11:05 AM
Blimey people, please listen to a guy who has spent a lot of time designing
these kinds of things.
You /must/ have an insertion operation which returns (in an out parameter) a
boolean (or some other kind of) flag, where the flag indicates what happened
(e.g. whether the key already existed or not), but does not raise an
exception either way. This is because there are lots of well-used algorithms
that rely (for their efficiency) on being able to quickly tell whether a key
exists, and to insert a new value (only) if it doesn't (and to know if it
did). This rules out indicating non-existing by raising an exception (way
too slow), and doing a separate check in advance means searching the tree or
hash table twice, which is too inefficient.
In Ada at least, it is certainly /nice/ to have another procedure which has
no flag, and which simply raises an exception if the key already exists.
This is because there are many situations and algorithms that expect never
to insert the same key twice (and if it happens, this indicates a problem
with code or data). Obviously, this procedure could be written by the user
in terms of the former procedure, but it is so often used it seems justified
(to me) to provide it.
You also need a replacement operation that returns a flag and a deletion
operation that returns a flag, for the same reasons.
****************************************************************
From: Matthew Heaney
Sent: Friday, July 2, 2004 12:10 AM
> You /must/ have an insertion operation which returns (in an out parameter) a
> boolean (or some other kind of) flag, where the flag indicates what happened
> (e.g. whether the key already existed or not), but does not raise an
> exception either way. This is because there are lots of well-used algorithms
> that rely (for their efficiency) on being able to quickly tell whether a key
> exists, and to insert a new value (only) if it doesn't (and to know if it
> did). This rules out indicating non-existing by raising an exception (way
> too slow), and doing a separate check in advance means searching the tree or
> hash table twice, which is too inefficient.
I agree with all of this. The API supports all of this behavior.
> In Ada at least, it is certainly /nice/ to have another procedure which has
> no flag, and which simply raises an exception if the key already exists.
> This is because there are many situations and algorithms that expect never
> to insert the same key twice (and if it happens, this indicates a problem
> with code or data). Obviously, this procedure could be written by the user
> in terms of the former procedure, but it is so often used it seems justified
> (to me) to provide it.
The user can handle this very easily using the existing API:
declare
C : Cursor;
B : Boolean;
begin
Insert (S, E, C, B);
pragma Assert (B);
end;
or for a map:
declare
C : Cursor;
B : Boolean;
begin
Insert (M, K, E, C, B);
pragma Assert (B);
end;
I will state again that if Insert returns False, whether this is an
error depends on the application. I have given many !examples of why a
status of False is *not* an error.
> You also need a replacement operation that returns a flag and a deletion
> operation that returns a flag, for the same reasons.
You certainly don't need another replacement operation that passes back
a flag, since the flag-based insert provides a superset of the
functionality of replace. (Look at the implementation of map Replace,
which is simply a convenience function that is implemented in terms of
Insert).
Replace is just the same as:
declare
C : Cursor;
B : Boolean;
begin
Insert (M, K, E, C, B);
if not B then
Replace_Element (C, By => E);
end if;
end;
If you want a flag for replacement, then just use the algorithm above.
Delete is a borderline case. I probably wouldn't bother passing back a
flag since:
(1) in the cursor-based delete, the delete must succeed;
(2) in the key-based delete, you can test the value returned by Length
before and after the call to determine whether the key was deleted, so
no flag is necessary;
(3) instead of using the key-based delete, you can use Find and the
cursor-based delete as follows:
declare
C : Cursor := Find (M, K);
begin
if Has_Element (C) then
Delete (M, C);
end if;
end;
I will repeat my position on this matter: the only change we need to
make here is to add an Insert for sets that omits the position and
success parameters, and possibly add a Replace operation for sets. The
map is adequate as is.
****************************************************************
From: Robert A. Duff
Sent: Friday, July 2, 2004 12:29 PM
Nick Roberts says:
> You /must/ have an insertion operation which returns (in an out parameter) a
> boolean (or some other kind of) flag, where the flag indicates what happened
> (e.g. whether the key already existed or not), but does not raise an
> exception either way. This is because there are lots of well-used algorithms
> that rely (for their efficiency) on being able to quickly tell whether a key
> exists, and to insert a new value (only) if it doesn't (and to know if it
> did). ...
I find that convincing.
****************************************************************
From: Adam Beneschan
Sent: Friday, July 2, 2004 1:19 PM
Marius Amado Alves wrote:
> > I realize that I advocated avoiding changes to the specification, but this
> > AI is going to be shot down by WG9 if it contains safety holes.
>
> I don't see a safety hole here, just a different style of doing the same
> thing. Remember the "Success" parameter does not reflect some kind of
> 'total' success of the operation, just that the element was already
> there. Other, really abnormal, conditions raise exceptions as expected.
and Matthew Heaney wrote:
> If I have a set of integers, and I do this:
>
> Insert (S, 42, C, B);
> Insert (S, 42, C, B);
>
> Then in the first call, B returns True, and in the second case, B returns
> False. There are no errors.
I agree that this shouldn't necessarily be considered an error (it
depends on the application), but doesn't this indicate that the
"Success" parameter is misnamed? The opposite of "Success" is
"Failure", which does (to me) carry the connotation of "something
going WRONG", i.e. an error.
And sorry, I don't have a better suggestion. Something like
We_Were_Able_To_Do_The_Insertion_Because_It_Wasnt_Already_There would
be a more descriptive name but suffers from other flaws, such as being
about three times too long :)
I haven't been following this thread religiously, so my apologies if
this ground has been covered already......
****************************************************************
From: Marius Amado Alves
Sent: Friday, July 2, 2004 4:22 PM
> I agree that this shouldn't necessarily be considered an error (it
> depends on the application), but doesn't this indicate that the
> "Success" parameter is misnamed?
Yes. I had thought about this before. I almost sent an illustration
similar to yours, I think it was
Insert_If_Not_Already_There_Otherwise_Let_It_Be
(... It_Was_Not_There_So_I_Have_Inserted : Boolean)
and in the meanwhile I thought of Proper_Insertion or New_Element
instead of Success. And the bloody operation is really Ensure_Inserted,
no? And to this day I'm still a big fan of "Put" and "Get" (instead of
Insert and Element). But I still don't advise any changes now, for the
reasons Pascal gave... and took back :-)
****************************************************************
From: Matthew Heaney
Sent: Friday, July 2, 2004 5:35 PM
> I agree that this shouldn't necessarily be considered an
> error (it depends on the application), but doesn't this
> indicate that the "Success" parameter is misnamed? The
> opposite of "Success" is "Failure", which does (to me) carry
> the connotation of "something going WRONG", i.e. an error.
The boolean parameter simply conveys information about this insertion. My
model is that the call is really an insertion attempt, and depending on the
current state of the container the attempt can succeed or the attempt can
fail. The fact that an insertion attempt "fails" does not imply that
there's an error. There would only be an error if the post-condition could
not be satisfied, in which case an exception would be raised. But as we
have seen, the post-condition is satisfied (because the element is already
in the set), so there is no error and hence no exception either.
This behavior is similar to atomically grabbing a lock, and immediately
returning if the resource is already locked. See for example the Win32 API
function TryEnterCriticalSection.
****************************************************************
From: Nick Roberts
Sent: Friday, July 2, 2004 1:29 PM
> The user can handle this very easily using the existing API:
...
Yes, but it would be quite a bit neater and easier to be able to write the
one line:
Insert (S, E);
for a set, or:
Insert (M, K, E);
for a map, and this functionality is quite often required in practice.
>> You also need a replacement operation that returns a flag and a
>> deletion operation that returns a flag, for the same reasons.
>
> You certainly don't need another replacement operation that passes
> back a flag, since the flag-based insert provides a superset of the
> functionality of replace.
This isn't the functionality I was thinking of.
> Replace is just the same as:
>
> declare
> C : Cursor;
> B : Boolean;
> begin
> Insert (M, K, E, C, B);
>
> if not B then
> Replace_Element (C, By => E);
> end if;
> end;
The replacement operation that I was thinking of would never insert a new
key-value pair, it would either: replace the value for a given key, and
return 'true' for a flag 'exists'; do nothing and return 'false' for the
flag. The above code is not equivalent to this.
The following code would, I think, be equivalent to what I intended:
declare
C : Cursor := Find (M, K);
begin
if Has_Element (C) then
Replace_Element (C, By => E);
end if;
end;
However, it would be slightly neater and easier to be able to write
something like:
Replace_When_Exists (M, K, E, B);
> Delete is a borderline case. I probably wouldn't bother passing
> back a flag since:
> ...
> (2) in the key-based delete, you can test the value returned by
> Length before and after the call to determine whether the key was
> deleted, so no flag is necessary;
That's fine, since the Length is then acting as a kind of flag. Maybe not
the neastest solution (there seems the danger of somewhat obfuscated code).
> (3) instead of using the key-based delete, you can use Find and
> the cursor-based delete as follows:
>
> declare
> C : Cursor := Find (M, K);
> begin
> if Has_Element (C) then
> Delete (M, C);
> end if;
> end;
But again it would be slightly neater and easier to write something like:
Delete_When_Exists (S, E, B);
for a set, or:
Delete_When_Exists (M, K, B);
for a map.
> I will repeat my position on this matter: the only change we need
> to make here is to add an Insert for sets that omits the position
> and success parameters, and possibly add a Replace operation for
> sets. The map is adequate as is.
I agree with this statement, but I think there is a marginal argument for
the addition of replacement and deletion operations with a flag.
****************************************************************
From: Matthew Heaney
Sent: Friday, July 2, 2004 5:22 PM
> Yes, but it would be quite a bit neater and easier to be able
> to write the
> one line:
>
> Insert (S, E);
> for a set, or:
Yes, but you want this statement to raise an exception if E is already in
set S. This is *not* what I want.
The reason we disagree is because we have different pre- and post-conditions
for set insertion. Matt's pre- and post-conditions are:
procedure Insert (S : in out Set; E : in ET);
--pre: True
--post: Is_In (S, E)
That's why in Matt's universe there are no exceptions: the precondition is
as weak as possible, and the post-condition guarantees that the element is
in the set. If E is already in S, then the post-condition is satisfied, and
when the post-condition is satisfied then there's no reason to raise an
exception.
However, in Nick's universe the pre-condition is:
procedure Insert (S : in out Set; E : in ET);
--pre: not Is_In (S, E)
--post: Is_In (S, E)
If the element is already in the set, then the precondition has been
violated, and so an exception is raised. However, this behavior doesn't
make a lot of sense, since the invariants of the set abstraction are
preserved even if we were to weaken the pre-condition (as in Matt's
universe).
A note on exception behavior: In general, if a pre-condition is violated
(and the operation detects this), then it is appropriate to raise an
exception, in order to preserve the integrity of the abstraction, and to
signal the fact that the post-condition cannot be satisfied. The strange
thing about Nick's semantics is that the post-condition is satisfied even if
the pre-condition isn't, so what's the point of having an exception? The
effect of the call is the same either way.
However, because we want to be good citizens, we're not supposed to violate
pre-conditions (the exception is there to remind us to change our bad
behavior), so insertion into a set would have to written like this:
declare
C : Cursor := Find (S, E);
begin
if not Has_Element (C) then
Insert (S, E);
end;
end;
But now of course we have doubled the amount of work, since the work done by
Insert simply duplicates the work of Find, which is precisely what we were
trying to avoid!
At the end of the day, an insertion operation that omits the cursor and
boolean parameters should have the same behavior as the insertion operation
that includes those parameters. If you want insertion that omits the
parameters to have a different behavior, then the operation should have a
different name. This is precisely why the map operation sans cursor and
boolean is named "Replace" instead of "Insert".
> Insert (M, K, E);
>
> for a map, and this functionality is quite often required in practice.
I have *never* needed an insertion operation to raise an exception,
especially when I have an insertion operation that reports whether the
insertion succeeded. I don't find the argument "often required in practice"
very convincing, especially since we have had many actual examples of code
from me and others that specifically don't need or want the exception.
Instead of calling Insert as above (and getting an exception), then just
call Replace:
Replace (M, K, E);
That does everything that the insertion operation above does, but without
the exception.
****************************************************************
From: Matthew Heaney
Sent: Friday, July 2, 2004 8:56 PM
BTW: The duplication of search overhead could be avoided in the code
fragment above if the API had an insert with hint:
declare
C : Cursor := Ceiling (S, E);
begin
if not Has_Element (C) or else E < C then
S.Insert (Hint => C, New_Item => E);
end if;
end;
The hint form of insertion guarantees that insertion is O(1) if the hint is
useful. This property is satisfied by the result of the Ceiling function.
****************************************************************
From: Nick Roberts
Sent: Friday, July 2, 2004 9:28 PM
> ...
> Yes, but you want this statement to raise an exception if E is
> already in set S. This is *not* what I want.
Heh. But it /is/ what I want! Actually, for set insertion, I think both
operations -- raise exception if already there, do nothing if already
there -- would be nice. I would kinda expect the latter to be named
something like 'union', but that's pretty cosmetic.
The reason I would like the insertion that would raise an exception is,
for example, if I were reading a list of values from a file (or any serial
source), and I wanted to check that there were no duplicates. On the other
hand, of course, if I just wanted to build up a set and I didn't care
about duplicates, I'd want the other kind of insertion.
>> Insert (M, K, E);
>>
>> for a map, and this functionality is quite often required in
>> practice.
>
> I have *never* needed an insertion operation to raise an
> exception, especially when I have an insertion operation that
> reports whether the insertion succeeded. I don't find the
> argument "often required in practice" very convincing,
> especially since we have had many actual examples of code
> from me and others that specifically don't need or want the
> exception.
Well, all I can say is that I have been writing real application programs
for business, science, and the military, for many, many years, and it is
has often been a requirement for me. The typical scenario is that I've got
a file (or other serial source) to read into a map (or equivalent), and I
must check that there are no duplicates (of key).
****************************************************************
From: Matthew Heaney
Sent: Saturday, July 3, 2004 1:05 AM
> The reason I would like the insertion that would raise an exception is,
> for example, if I were reading a list of values from a file (or any serial
> source), and I wanted to check that there were no duplicates.
Fine, then use the Success parameter to check that there are no duplicates.
> Well, all I can say is that I have been writing real application programs
> for business, science, and the military, for many, many years, and it is
> has often been a requirement for me. The typical scenario is that I've got
> a file (or other serial source) to read into a map (or equivalent), and I
> must check that there are no duplicates (of key).
Fine, then use the Success parameter to check that there are no duplicates.
****************************************************************
From: Nick Roberts
Sent: Saturday, July 3, 2004 4:37 AM
> Fine, then use the Success parameter to check that there
> are no duplicates. [x2]
Hehe. Yes, but that's not the point. The whole point of the original
suggestion (an insertion without a flag) was not that it would do
something that /cannot/ be done by the version with a flag, but that it
would be a bit more convenient in many typical cases.
It's merely the difference between:
Account: Account_Record;
...
while not End_of_File(F) loop
Read(F,Account);
Insert(M,Account.ID,Account.Balance);
end loop;
and:
Account: Account_Record;
Okay: Boolean;
...
while not End_of_File(F) loop
Read(F,Account);
Insert(M,Account.ID,Account.Balance,Okay);
raise Duplicate_Account when not Okay;
end loop;
so I would have to admit that it's almost a trivial convenience. But this
/is/ a scenario that occurs very often in practice, and I think that
actually justifies the inclusion of the non-flagged insertion.
****************************************************************
From: Matthew Heaney
Sent: Saturday, July 3, 2004 11:41 AM
The point I was trying to make is that the library doesn't know whether the
statement:
Insert (M, K, E);
is an error, meaning that it should propagate an exception if key K is
already in map M. Only the library user can know whether a duplicate key is
an error. (We have seen examples of both interpretations.)
> It's merely the difference between:
[example snipped]
I would have handled a duplicate key by simply ignoring it. Or I would have
used Replace.
My argument is that the library should be neutral wrt duplicate key
behavior.
> so I would have to admit that it's almost a trivial
> convenience. But this
> /is/ a scenario that occurs very often in practice, and I think that
> actually justifies the inclusion of the non-flagged insertion.
It's helpful to differentiate map and sets here. Everyone seems to agree
that the set statement:
Insert (S, E);
makes sense, since if E is already in the map then the post-condition is
satisfied.
The debate is about how to how to interpret the map statement:
Insert (M, K, E);
There are two possible interpretations if K is already in M:
(1) This is an error, and a duplicate key exception is propagated.
(2) This is not an error, and there are no exceptions.
You could justify (1) on the grounds that since value E is not entered into
the map, then the caller should be alerted to this fact. However, a reason
to reject interpretation (1) is that you can simply call Replace to get that
behavior.
That leaves (2). This has the benefit of symmetry with the corresponding
insertion operation for sets. The meaning of these operations is "if the
key is already in the container, then do nothing." Another meaning is "this
is the same as the canonical insertion operation, except that the cursor and
boolean parameters are omitted."
Obviously I favor interpretation (2).
****************************************************************
From: Matthew Heaney
Sent: Saturday, July 3, 2004 1:12 PM
I think I've got to confess that I didn't expect Replace to have these
semantics, and I didn't read the AI carefully enough. Sorry.
I think perhaps, in the light of the lateness in the day, and the fact
that these container abstractions were always intended to be quite
low-level, upon which users would generally build higher-level
abstractions, it's not worth arguing about a few extra convenience
procedures too much.
****************************************************************
From: Randy Brukardt
Sent: Saturday, July 3, 2004 3:59 PM
...
> The debate is about how to how to interpret the map statement:
>
> Insert (M, K, E);
>
> There are two possible interpretations if K is already in M:
>
> (1) This is an error, and a duplicate key exception is propagated.
> (2) This is not an error, and there are no exceptions.
>
> You could justify (1) on the grounds that since value E is not
> entered into the map, then the caller should be alerted to this fact.
> However, a reason to reject interpretation (1) is that you can simply
> call Replace to get that behavior.
Ugh. Something called "Replace" should not have insertion semantics; that
is, replacing something that doesn't exist is an error in my view. Probably
the primary cause of confusion in this discussion is that "Replace" might do
an insert, and "Insert" might not do an insert. Both of these seem goofy to
me.
But, as Nick said, it's probably more important that these are consistent
and stable than that they match a particular world view (mine :-).
****************************************************************
From: Matthew Heaney
Sent: Saturday, July 3, 2004 4:47 PM
I realized after I composed my last message that another possibility is to
interpret the statement:
Insert (M, K, E);
as having the same behavior as the operation we're calling "Replace." That would
allow us to either get rid of exising Replace operation, or keep it but give it
the (slightly different) semantics Nick described in his earlier post. (It
sounds like Randy's leaning that way already.)
****************************************************************
From: Pascal Leroy
Sent: Monday, July 5, 2004 4:03 AM
Randy wrote:
> Ugh. Something called "Replace" should not have insertion
> semantics; that is, replacing something that doesn't exist is
> an error in my view. Probably the primary cause of confusion
> in this discussion is that "Replace" might do an insert, and
> "Insert" might not do an insert. Both of these seem goofy to me.
Agreed. This is hopelessly confusing.
> But, as Nick said, it's probably more important that these
> are consistent and stable than that they match a particular
> world view (mine :-).
None of this sound "consistent" or "stable" to me. At any rate the only
world view that matters is that of the Heads of Delegations who will vote
at the next WG9 meeting. In light of the discussion at the last meeting,
I see trouble ahead. But that may only be my inexperience...
****************************************************************
From: Marius Amado Alves
Sent: Monday, July 5, 2004 9:32 AM
Now you're scaring me! You mean the proposal seriously risks not pass
simply because of this?
Would an Note help?
Operations Replace and Insert have a slightly more complex semantics
than a direct interpretation of their names. Namely, the effect is
conditioned by the prior existence of the specified key:
Replace Insert
------------------------------------------------------
Key is already there replace item no change
Key is not there yet add key, item add key, item
------------------------------------------------------
****************************************************************
From: Pascal Leroy
Sent: Monday, July 5, 2004 10:38 AM
I am saying (repeating, actually) that some countries have expressed
concerns regarding the safety of the containers library as it stood at the
time of the last WG9 meeting. I suspect that these countries will
ultimately oppose this AI if they think that there are safety issues.
Whether that will be the majority (which would effectively kill the AI) or
not is an interesting question. Whether the said countries would go so
far as to oppose the entire Amendment is another interesting question. As
you can imagine, we can just go forward and count the votes, but the
outcome may be unpleasant. It's much better to find a consensus before
the vote.
In the case at hand I am uncomfortable with the notion that Replace
sometimes has an insertion semantics and Insert sometimes has a no-op
semantics. My opinion doesn't count, however, as I am not voting at WG9.
Furthermore, I may just be overreacting. However, I have a feeling that
this dodgy semantics is going to be hard to swallow in some quarters.
Back to the technical discussion. As far as I can tell we have identified
five different behaviors, all of which make sense depending on the
application needs:
1 - Insert and fail if key is already in the map.
2 - Insert and replace element if key is already in the map.
3 - Insert and do nothing if key is already in the map.
4 - Replace and fail if key is not in the map.
5 - Replace and insert if key is not in the map.
My advice would be to provide all five behaviors using five subprograms
with clearly distinct names. The notion of an out parameter that you can
drop on the floor is sure to make people nervous. On the other hand,
no-one is going to argue that there is a safety problem if you called Foo
when you really wanted to call Bar.
Just my two cents...
****************************************************************
From: Jeffrey Carter
Sent: Monday, July 5, 2004 12:02 PM
There's also
6 - Replace and do nothing if key is not in the map.
> My advice would be to provide all five behaviors using five subprograms
> with clearly distinct names. The notion of an out parameter that you can
> drop on the floor is sure to make people nervous. On the other hand,
> no-one is going to argue that there is a safety problem if you called Foo
> when you really wanted to call Bar.
I'm not sure providing 6 different insert and replace operations is a
good idea, either. One of each, with behaviors that don't overlap,
combined with query operations that allow building the other 4, may be
the clearest approach. That would argue for the operations that fail.
****************************************************************
From: Jean-Pierre Rosen
Sent: Monday, July 5, 2004 12:13 PM
I'm a bit afraid of having to many subprograms...
Why not follow the example of the "Drop" parameter in Ada.Strings.Fixed,
i.e. having an enumeration specifying behaviour?
****************************************************************
From: Nick Roberts
Sent: Monday, July 5, 2004 12:16 PM
I can suggest an emergency alteration to the AI. The changes required
don't seem to be drastic. There are four cases covered here (cases 2 and 5
that Pascal suggested seem to be the same).
> 3 - Insert and do nothing if key is already in the map.
procedure Insert (Container : in out Map;
Key : in Key_Type;
New_Item : in Element_Type;
Position : out Cursor;
Success : out Boolean);
If Length (Container) equals Size (Container), then Insert calls Resize to
resize Container to some larger value. Insert then uses Hash and Is_Equal_Key
to check if Key is already present in Container. If a key matches, Success
returns False and Position designates the element with the matching key.
Otherwise, Insert allocates a new node, initializes it to Key and New_Item, and
adds it to Container. Success returns True and Position designates the
newly-inserted node. Any exceptions raised during allocation are propagated
and Container is not modified.
[This is exactly as in the current AI.]
> 1 - Insert and fail if key is already in the map.
procedure Insert (Container : in out Map;
Key : in Key_Type;
New_Item : in Element_Type;
Position : out Cursor);
[One possible wording is:]
Insert without a Success parameter is equivalent to Insert with a
Success parameter with the difference that if Success would have
been False then this operation propagates the exception
Insertion_Error.
[or another possible wording is:]
If Length (Container) equals Size (Container), then Insert calls Resize to
resize Container to some larger value. Insert then uses Hash and
Is_Equal_Key
to check if Key is already present in Container. If a key matches,
propagates
the exception Insertion_Error and Position designates the element with the
matching key. Otherwise, Insert allocates a new node, initializes it to Key
and New_Item, and adds it to Container. Position designates the newly-
inserted node. Any exceptions raised during allocation are propagated and
Container is not modified.
> 2 - Insert and replace element if key is already in the map.
> 5 - Replace and insert if key is not in the map.
procedure Insert_or_Replace (Container : in out Map;
Key : in Key_Type;
New_Item : in Element_Type);
Insert_or_Replace inserts Key and New_Item as per Insert, with the
difference that if Key is already in the map, then this operation
assigns New_Item to the element associated with Key. Any exceptions
raised during assignment are propagated.
[This procedure is named Replace in the current AI.]
> 4 - Replace and fail if key is not in the map.
procedure Replace (Container : in out Map;
Key : in Key_Type;
New_Item : in Element_Type);
Replace assigns New_Item to the element associated with Key. If Key is
not already in the map, then this operation propagates the exception
Replacement_Error, and does not perform any assignment. Any exceptions
raised during assignment are propagated.
[This procedure has the same profile as Replace in the current AI, but the
wording is changed to provide the exception raising semantics.]
Instead of the name 'Insert_or_Replace' I have suggested, a name such as
'Emplace' might be considered a little more succinct.
We need to add the two exceptions Insertion_Error and Replacement_Error to
the base containers package:
~~~
The library package Containers has the following declaration:
package Ada.Containers is
pragma Pure;
type Hash_Type is mod <Implementation-Defined>;
type Size_Type is range 0 .. <implementation-defined>;
Insertion_Error, Replacement_Error: exception;
end Ada.Containers;
Hash_Type represents the range of the result of a hash function. Size_Type
represents the (potential or actual) size (number of elements) of a
container.
Insertion_Error is raised when insertion into a container fails.
Replacement_Error is raised when replacement of a value in a container
fails.
~~~
I actually think we should refrain from trying to add any further
operations to the AI now, since there could be a combinatorial explosion
due to the other permutations (e.g. supplied value versus default value
for insertion, fail silently versus raise exception versus return a flag
for deletion). I guess it'll be difficult to make any changes at all.
****************************************************************
From: Nick Roberts
Sent: Monday, July 5, 2004 12:37 PM
Possibly the following procedure should also be added:
~~~
procedure Replace (Container : in out Map;
Key : in Key_Type;
New_Item : in Element_Type;
Success : out Boolean);
~~~
I'll write a wording later. The idea is that it assigns New_Item if Key
exists and sets Success to True, otherwise it sets Success to False. It
could be argued that this procedure would not be much better than finding
the key into a cursor, and then testing the cursor (and doing a
Replace_Element if it is not No_Element).
> I'm not sure providing 6 different insert and replace operations
> is a good idea, either. One of each, with behaviors that don't
> overlap, combined with query operations that allow building the
> other 4, may be the clearest approach. That would argue for the
> operations that fail.
I think this is basically right, and if we think of adding any operations
at all at this stage, it should be as few as possible. I think it's
probably okay for the Delete to silently do nothing if the given Key does
not exist, although this behaviour might be surprising to some programmers.
****************************************************************
From: Marius Amado Alves
Sent: Monday, July 5, 2004 12:35 PM
> I'm not sure providing 6 different insert and replace operations is a
> good idea, either. One of each, with behaviors that don't overlap,
> combined with query operations that allow building the other 4, may be
> the clearest approach. That would argue for the operations that fail.
Whatever you do, remember that 'algebraic' behavior is not the only
factor in the design, there is also the fact that these operations
perform search, combined with the fact that many idioms can profit on
this fact for efficiency if that result is made known or used directly
by the 'strange' semantics (e.g. Replace doing an insertion), usually to
avoid searching twice, as already well exemplified. The current
operations represent well a set of primitives that takes this 'unpure'
but required factors into consideration. Rename them Extended_Replace
and Extended_Insert and provide the 'pure' operations with the
'unextended' names perhaps.
****************************************************************
From: Nick Roberts
Sent: Monday, July 5, 2004 12:56 PM
On Mon, 5 Jul 2004 19:13:00 +0200, Jean-Pierre Rosen <rosen@adalog.fr>
wrote:
> I'm a bit afraid of having to many subprograms...
Yes, I think we all are!
> Why not follow the example of the "Drop" parameter in
> Ada.Strings.Fixed, i.e. having an enumeration specifying
> behaviour?
I think that's a reasonable idea. We could add:
type Failure_Action is (Ignore, Error);
to the base package Ada.Containers and then add a parameter such as:
On_Failure: Failure_Action := Error
to the Insert, Replace, and Delete operations that didn't have a Success
parameter. This parameter could be named 'When_Exists' or 'When_Absent' as
appropriate.
****************************************************************
From: Marius Amado Alves
Sent: Monday, July 5, 2004 2:22 PM
Someone had already proposed a 'control coupled' design, and with only
Boolean types, which is better than this because it has less special
types to manage. (Those special types in the standard String operations
are a pain, each time I need them here I go for the RM to find out
exactly what package do I have to withen, and then I have to use "use"
to keep my sanity writing when the calls.)
And some say control coupling is a bad thing, whatever types you use.
****************************************************************
From: Nick Roberts
Sent: Monday, July 5, 2004 3:50 PM
I too think control-coupled procedures are often a bad idea, generally
because they can make the implementations of those procedures a logical
tangle (of deeply nested ifs and cases), which can be bad for correctness
and maintenance, and sometimes significantly bad for performance.
On the other hand, a proliferation of procedures, where you have many
different variations on a theme, are also a bad idea. Indeed, in this
case, I'd say a worse idea.
As for using Booleans, I've tried that myself in the past and gradually
come to the conclusion that using enumerated types with (more) meaningful
names is usually preferable. The name availability problem can be
ameliorated by techniques such as replication within the generic
specification. For example, one could add the declarations:
subtype Failure_Action is Ada.Containers.Failure_Action;
Ignore: constant Failure_Action := Ada.Containers.Ignore;
Error: constant Failure_Action := Ada.Containers.Error;
to the specification of Ada.Containers.Hashed_Maps.
****************************************************************
From: Matthew Heaney
Sent: Monday, July 5, 2004 9:11 PM
> I am saying (repeating, actually) that some countries have
> expressed concerns regarding the safety of the containers
> library as it stood at the time of the last WG9 meeting.
What "safety" issue? The only one I can think of is the behavior of
Update_Element for sets. Tucker suggested a change to remove the erroneous
behavior and now all is well. Were there some others?
> In the case at hand I am uncomfortable with the notion that
> Replace sometimes has an insertion semantics and Insert
> sometimes has a no-op semantics.
Then why not just rename "Replace" to "Insert" instead? We would then have:
(1) Insert (Container : in out Map;
Key : in Key_Type;
New_Item : in Element_Type;
Position : out Cursor;
Success : out Boolean);
If the Key is not in the map Container, then insert a new value pair Key and
New_Item in the map, and set Success to True. If the Key is already in the
map, the set Success to False and don't do anything else.
This is what we have right now. This has the same semantics as the
similarly-named operation in the STL.
(2) Insert (Container : in out Map;
Key : in Key_Type;
New_Item : in Element_Type);
If Key is not in the map, then insert a new value pair Key and New_Item in
the map. If Key is already in the map, then assign New_Item to the element
associated with Key.
We have this operation in the API already, but it's called "Replace".
We could also add another operation, with behavior not in the current API:
(3) Replace (Container : in out Map;
Key : in Key_Type;
New_Item : in Element_Type);
If Key is not in the map, then do nothing. If Key is already in the map,
then assign New_Item to the element associated with Key.
> My opinion doesn't count,
> however, as I am not voting at WG9. Furthermore, I may just
> be overreacting. However, I have a feeling that this dodgy
> semantics is going to be hard to swallow in some quarters.
I don't know which semantics are "dodgy," since these are the same semantics
(for Insert, anyway) as for the STL, which is already an ISO standard.
> Back to the technical discussion. As far as I can tell we
> have identified five different behaviors, all of which make
> sense depending on the application needs:
>
> 1 - Insert and fail if key is already in the map.
See (1) above.
> 2 - Insert and replace element if key is already in the map.
See (2) above.
> 3 - Insert and do nothing if key is already in the map.
Hmmm. What's the difference between "Insert and fail" and "Insert and do
nothing"? Same as (1) above?
> 4 - Replace and fail if key is not in the map.
If by "fail" you mean "do nothing," then see (3) above.
> 5 - Replace and insert if key is not in the map.
I don't know what this means. Sounds like (2) above.
> My advice would be to provide all five behaviors using five
> subprograms with clearly distinct names.
The consensus seems to be it's a problem that the operation we're now
calling "Replace" can insert a new key if it doesn't already exist in the
map. That's an easy problem to solve: just name the operation "Insert"
instead.
> The notion of an
> out parameter that you can drop on the floor is sure to make
> people nervous.
I can't imagine why. Nick gave a nice summary of the rationale for such an
operation, and I have given several examples of why conditional insertion is
both useful and necessary.
Note again that this is exactly what the STL does.
> On the other hand, no-one is going to argue
> that there is a safety problem if you called Foo when you
> really wanted to call Bar.
As far as I can tell, some people have objected to the fact that the
operation named "Replace" can insert a new key. So either rename it
"Insert," or just get rid of it. This is a very minor change.
****************************************************************
From: Matthew Heaney
Sent: Monday, July 5, 2004 9:20 PM
> I'm a bit afraid of having to many subprograms...
> Why not follow the example of the "Drop" parameter in
> Ada.Strings.Fixed, i.e. having an enumeration specifying behaviour?
As others have already pointed out, this is an example of "control
coupling." This is my least favorite aspect of the Ada.Strings.* API.
With the Insert we have now (the map operation that has 5 parameters), you
can build any of the other behaviors.
The operation named "Replace" is merely a convenience function, to either
insert a new key if it isn't in the map, or replace the current element
value if the key is already in the map. The issue seems to be that an
operation named "Replace" can insert a new key. All we need to do is rename
the operation "Insert".
****************************************************************
From: Matthew Heaney
Sent: Monday, July 5, 2004 9:24 PM
> I can suggest an emergency alteration to the AI. The changes
> required
> don't seem to be drastic.
Well, adding new exceptions to the API is a very drastic change. It's also
completely unnecessary.
All we need to do is change the name of the operation "Replace" to "Insert",
and all is well.
****************************************************************
From: Matthew Heaney
Sent: Monday, July 5, 2004 9:30 PM
> I think this is basically right, and if we think of adding
> any operations
> at all at this stage, it should be as few as possible.
We don't need to add any new operations. Just rename "Replace" to "Insert".
(Note that I would be in favor of creating a new replacement-style operation
that has the semantics you mentioned in your post a couple of days ago, but
even that can be easily built from the Find and Replace_Element primitives.)
> I think it's
> probably okay for the Delete to silently do nothing if the
> given Key does
> not exist, although this behaviour might be surprising to
> some programmers.
I can't image why. The post-condition is that the key isn't in the map. If
the key isn't in the map before the call, then the post-condition is
satisfied. What's surprising about that?
****************************************************************
From: Pascal Leroy
Sent: Tuesday, July 6, 2004 2:15 AM
> > I am saying (repeating, actually) that some countries have
> expressed
> > concerns regarding the safety of the containers library as
> it stood at
> > the time of the last WG9 meeting.
>
> What "safety" issue? The only one I can think of is the
> behavior of Update_Element for sets. Tucker suggested a
> change to remove the erroneous behavior and now all is well.
> Were there some others?
If you don't mind, I am not going to disclose private discussions on a
public forum. I was only trying to wave a red flag.
"Safety" is not merely "erroneousness". Consider two changes that were
made by the ARG recently: (1) the definition of map equality was changed
to compare the key/element pairs, instead of only the elements; and (2)
functions Lower_Bound and Upper_Bound were made symmetrical. There were
no erroneousness issues in these cases; still, without these changes the
AI was sure to be dead-on-arrival, take my word for it. The reason is
that semantics that are "surprising" can very easily lead to programming
errors, so it is best to make the semantics as pure and "natural" as
possible, given the other constraints. Of course, what counts as
"surprising" is in the eye of the beholder to some extent. But the fact
that there has been so much discussion on Insert and Replace recently is
probably an indication that these operations are not exactly WYSIWYG.
> I don't know which semantics are "dodgy," since these are the
> same semantics (for Insert, anyway) as for the STL, which is
> already an ISO standard.
This is completely bogus. The fact that STL is an ISO standard is
irrelevant. AI-302 will be judged based on how well it preserves and
extends the "good properties" of Ada: safety, readability, portability,
etc. If it can be compatible with STL, so much the better, but if it
cannot, too bad. In particular, if the semantics of some operations are
felt to be inadequate, repeating the mantra "it's the same as STL" won't
help.
****************************************************************
From: Pascal Leroy
Sent: Tuesday, July 6, 2004 2:31 AM
Nick wrote:
> I too think control-coupled procedures are often a bad idea,
> generally because they can make the implementations of those
> procedures a logical tangle (of deeply nested ifs and cases),
> which can be bad for correctness and maintenance, and
> sometimes significantly bad for performance.
Note that we should really be designing this library for the user, not for
the implementer. After all, there will only be a handful implementations
of these units (the compiler vendors, plus a few Matts here and there).
These implementations will hopefully be extensively tested by the ACATS.
So correctness and maintenance of the library itself is only a secondary
concern.
The real issue is correctness and maintenance of the code on the client
side. Here I don't necessarily see control coupling as bad.
Syntactically, there is very little difference between:
Insert_And_Replace_If_Present (...);
and:
Insert (Replace_If_Present => True, ...);
And in fact, judicious use of defaulted parameters can improve the
readability of the calls (you demonstrated how a defaulted parameter could
be use to force detection of errors by default). Furthermore, control
coupling makes it possible to dynamically/globally switch some options
(for instance, detect errors based on the value of an environment
variable), something which is hard to do with multiple entry points.
> On the other hand, a proliferation of procedures, where you
> have many different variations on a theme, are also a bad
> idea. Indeed, in this case, I'd say a worse idea.
>
> As for using Booleans, I've tried that myself in the past and
> gradually come to the conclusion that using enumerated types
> with (more) meaningful names is usually preferable.
I agree with Nick here.
****************************************************************
From: Cyrille Comar
Sent: Tuesday, July 6, 2004 4:42 AM
Pascal Leroy writes:
> It's much better to find a consensus before the vote.
I strongly agree. Better have a consensus now and a some official base
for Ada containers as part of the standard rather than wait for a non
widely discussed defacto standard to emerge.
****************************************************************
From: Nick Roberts
Sent: Tuesday, July 6, 2004 6:33 AM
I will play devil's advocate, and make a proposal for changes to
the AI, based on the idea of having a parameter to control behaviour
when certain pre-conditions are not met.
Change the base package:
~~~
The library package Containers has the following declaration:
package Ada.Containers is
pragma Pure;
type Hash_Type is mod <Implementation-Defined>;
type Size_Type is range 0 .. <implementation-defined>;
type Error_Action is (Ignore, Error);
Key_Error: exception;
end Ada.Containers;
Hash_Type represents the range of the result of a hash function. Size_Type
represents the (potential or actual) size (number of elements) of a
container.
Key_Error and Error_Action are used in conjunction with certain container
operations, for handling the situation when a key does or does not exist,
as described below.
~~~
All of the following changes are for the Ada.Containers.Hashed_Maps
generic package.
Add the following declarations into the package listing:
~~~
subtype Error_Action is Ada.Containers.Error_Action;
Ignore: constant Error_Action := Ada.Containers.Ignore;
Error: constant Error_Action := Ada.Containers.Error;
~~~
Add the following wording after that for the existing Insert:
~~~
procedure Insert (Container : in out Map;
Key : in Key_Type;
New_Item : in Element_Type;
Position : out Cursor;
Key_Exists : in Error_Action := Error);
Insert without a Success parameter is equivalent to Insert with a
Success parameter with the difference that if Success would have been
False then this operation does: nothing, if Key_Exists is Ignore;
propagates the exception Key_Error, if Key_Exists is Error.
~~~
Add the specification of this procedure into the package listing.
Rename the procedure 'Replace' as 'Insert_Or_Replace' in the package
listing and in the wording.
Add the following wording after that for Insert_Or_Replace:
~~~
procedure Replace (Container : in out Map;
Key : in Key_Type;
New_Item : in Element_Type;
Key_Absent : in Error_Action := Error);
Replace assigns New_Item to the element associated with Key. If Key is
not already in the map, then this operation does not perform any
assignment and does: nothing, if Key_Absent is Ignore; propagates the
exception Key_Error, if Key_Absent is Error. Any exceptions raised
during assignment are propagated.
~~~
Add the specification of this procedure into the package listing.
Replace the current wording for the Delete procedure with Key with:
~~~
procedure Delete (Container : in out Map;
Key : in Key_Type;
Key_Absent : in Error_Action := Error);
Delete uses Hash and Is_Equal_Key to check if Key is present in
Container. If Key matches the key of a node, Delete removes the node
from the map and then deallocates the node. If Key is not already in the
map, then this operation does: nothing, if Key_Absent is Ignore;
propagates the exception Key_Error, if Key_Absent is Error.
AARM Notes: Delete should only compare elements that hash to the same
bucket in the hash table. Delete with Key_Absent=Ignore should work on
an empty map; nothing happens in that case.
~~~
Replace the current specification of this procedure in the package
listing with the specifcation above.
I think this idea raises an issue which needs consideration. Many other
operations raise Constraint_Error (e.g. if a key does not exists, or a
cursor is No_Element). One possibility is for Key_Error to be removed
from this change, and for Constraint_Error to be raised in its place.
Another possibility is to raise Key_Error instead of Constraint_Error in
many further places in the AI; in this case a more generalised name,
such as Container_Error, might be more appropriate. I feel the latter
option would probably be helpful to the user for debugging.
I'll happily suggest a set of changes to Ordered_Sets if the above seem
at all acceptable.
****************************************************************
From: Marius Amado Alves
Sent: Tuesday, July 6, 2004 8:23 AM
>>As for using Booleans, I've tried that myself in the past and
>>gradually come to the conclusion that using enumerated types
>>with (more) meaningful names is usually preferable.
>
> I agree with Nick here.
You guys need to take a deep breath or something :-) For the case at
hand an enumeration with only two values instead of a Boolean is just
useless baggage.
But I'm against *any* control coupling here. FWIW I agree with Matt's
set of proposals: simply rename Replace to Insert, maybe rename Success
to Proper_Insert or something, maybe add the flagless variant, keep the
rest of the map spec as is, and adjust the sets spec.
****************************************************************
From: Michael Yoder
Sent: Tuesday, July 6, 2004 11:35 AM
On the semantics of Insert: speaking mathematically, the notion that
"Insert might not insert" doesn't bother me. Insertion of an element
ought to be equivalent to finding the union with a singleton set, and
yes, that means if the element is already there the set doesn't change.
Still, I can accept that programmers' intuitions might not match those
of mathematicians in general or mine in particular. If I insert an ace
of spades into a poker hand already containing one, this is a situation
which (expressed delicately) indicates an error condition. I don't
find such analogies appropriate, but others might.
For sets, I suggest adding a procedure Insert_New. Speaking coarsely,
it acts just like Insert, but raises an exception if the inserted
element is already present.
For maps, I suggest these names for the five cases enumerated by Pascal:
On Jul 5, 2004, at 11:38 AM, Pascal Leroy wrote:
>
> Back to the technical discussion. As far as I can tell we have identified
> five different behaviors, all of which make sense depending on the
> application needs:
>
> 1 - Insert and fail if key is already in the map.
Insert_New
> 2 - Insert and replace element if key is already in the map.
Insert
> 3 - Insert and do nothing if key is already in the map.
Insert_If_New
> 4 - Replace and fail if key is not in the map.
Replace_Old
> 5 - Replace and insert if key is not in the map.
Replace
>
> My advice would be to provide all five behaviors using five subprograms
> with clearly distinct names. The notion of an out parameter that you can
> drop on the floor is sure to make people nervous. On the other hand,
> no-one is going to argue that there is a safety problem if you called Foo
> when you really wanted to call Bar.
****************************************************************
From: Matthew Heaney
Sent: Sunday, July 11, 2004 9:03 PM
> There are similar Insert procedures for both Ordered_Sets
> and Hashed_Maps with highly useful Position and Success
> parameters. Sometimes however, it seems somewhat disturbing
> to see declarations of variables for Position and Success
> that are not read because it is considered safe to ignore
> them. It might be known that Insert will succeed without
> surprises (ceteris paribus). Examples include adding initial
> values to a library level container, like 69 keywords, or
> adding known border values to ordered containers.
In the case of a set, the postcondition is satisfied no matter what value is
returned, so yes there should be convenience operation that omits the cursor
and boolean parameters, since they only add syntactic noise.
In the case of a map, you already have a convenience function called
"Replace" that omits the cursor and boolean parameters, and it is defined to
replace the element associated with the key if the key already exists in the
map.
I had originally named the three-parameter insertion operation for maps
"Insert", but was concerned that developers would think that it has the same
behavior as the five-parameter Insert, but with the cursor and boolean
parameters omitted. Since the behavior of the three-parameter operation was
kind of like Replace_Element (the difference is that it can insert a new
key), I named the operation "Replace".
However, others have argued that the fact that it can insert a new key means
it should be named Insert, which is OK with me.
So ultimately your request is for adding a convenience operation to the set.
There is already a convenience operation for maps, but it is currently named
Replace.
> In a
> sense, this might also make Ordered_Sets and Hashed_Maps
> correspond more closely to Vectors and Doubly_Linkes_List
> with regard to Insert procedures.
Yes.
Here's some history. My original proposal was quite long, and the ARG asked
me to cut it down. As I was re-writing the proposal to make it shorter, I
looked for operations that were possible candidates for removal, and one of
the operations I removed was the two-parameter insert for sets. This
appears to have been a mistake, but it's no big deal to add it back in. We
deliberately deployed a reference implementation to discover API bugs early,
and this looks like one. (If you have an Ada 2005 compiler that lets you
compile children of package Ada, then you can use the a-c*.ad[sb] version of
the library. It's at the tigris site.)
> (When using maps, I sometimes think of them as sparse
> arrays. With arrays, I can just write
>
> ary(key) := value;
>
> and be done.)
This is exactly how the STL works:
ary[key] = value;
You can even do stuff like:
++ary[key];
and it's guaranteed to work, since the standard says that if key isn't in
the map then it is inserted, and its element is constructed with its default
value. The index operation then returns a reference to the element
(newly-inserted or not), which the increment operator then works on.
In the Ada case, you have to pass in the default value explicitly (see my
histogram example), but it works the same as the index operation above (the
syntax is different, of course):
Ary.Insert (Key, 0, C, B);
declare
procedure Increment (I : in out Integer) is
begin
I := I + 1;
end;
begin
Update_Element (C, Increment'Access);
end;
This is equivalent to the C++ statement above. Note that the algorithm
above works no matter what value is returned for B. The only thing we
require is that the insertion be conditional; that is, if the key is already
in the map, then leave the associated element alone. It is certainly not an
"error" if B is false and the existing element value is left unmodified, and
in fact the algorithm above depends on this behavior.
****************************************************************
From: Matthew Heaney
Sent: Wednesday, July 7, 2004 6:52 AM
Randy:
I moved the update_element procedure into the generic_keys nested package
and declared it this way:
generic
with function Key (Element : in Element_Type)
return Key_Type;
procedure Generic_Update_Element
(Container : in out Set;
Position : in Cursor;
Process : not null access
procedure (Element : in out Element_Type));
I made the Key function a generic formal subprogram for that operation
directly instead of for the entire nested generic package, since that's the
only operation that needs a key selector.
The basic algorithm is something like:
procedure Generic_Update_Element (...) is
Old_Key : Key_Type renames Key (Position.Element);
begin
Process (Position.Element);
if Old_Key < Position.Element
or else Old_Key > Position.Element
then
<remove node from tree>
<insert node back into tree>
if <insertion failed because duplicate key> then
<free node>
raise Constraint_Error;
end if;
end if;
end Generic_Update_Element;
Note that you don't need to pass an "=" operator for keys, since you already
have "<" and ">" for comparing keys to elements.
One question I had was whether update element is allowed to change the key
(and hence change the relative position of the key's node). In the code
fragment above, the node is moved (deleted then immediately re-inserted)
when there's been a change in the value of the key, and there's an error
only if there's already a key with the new value.
Another way to have handled this is to not allow the re-insertion at all:
if Old_Key < Position.Element
or else Old_Key > Position.Element
then
<remove node from tree>
<free node>
raise Constraint_Error;
end if;
This handles the key-change more pessimistically. It wasn't clear from your
notes which behavior was intended.
You can check out the latest packages here:
ordered sets spec:
<http://charles.tigris.org/source/browse/charles/src/ai302/a-coorse.ads>
ordered sets body:
<http://charles.tigris.org/source/browse/charles/src/ai302/a-coorse.adb>
I'll have the indefinite ordered sets (a-ciorse.ad[sb]) done in a day or
two.
****************************************************************
From: Randy Brukardt
Sent: Wednesday, July 7, 2004 6:43 PM
...
> I made the Key function a generic formal subprogram for that operation
> directly instead of for the entire nested generic package, since that's the
> only operation that needs a key selector.
I think I prefer it to be on Generic_Keys, simply because that avoids the
need to use two instantiations.
...
> Note that you don't need to pass an "=" operator for keys, since you already
> have "<" and ">" for comparing keys to elements.
Right. I had already noted that in the draft minutes. I don't know if I
wrote Tucker's proposal down wrong, or if he made that mistake, but it's
clearly wrong when you read the whole discussion.
> One question I had was whether update element is allowed to change the key
> (and hence change the relative position of the key's node). In the code
> fragment above, the node is moved (deleted then immediately re-inserted)
> when there's been a change in the value of the key, and there's an error
> only if there's already a key with the new value.
I don't know; Tucker didn't cover that issue. My gut feeling is that the way
you have it is better, because dropping elements on the floor (even with an
exception) is nasty. If there is an alternative, it would be preferred. (Of
course, doing that could silently cause the routine to be expensive, which
is also annoying. But it wouldn't be much more expensive than Replace, so it
probably is OK.)
****************************************************************
From: Matthew Heaney
Sent: Wednesday, July 7, 2004 9:31 AM
> I think I prefer it to be on Generic_Keys, simply because
> that avoids the need to use two instantiations.
OK. I declared the nested package like this:
generic
type Key_Type (<>) is limited private;
with function Key (Element : in Element_Type)
return Key_Type;
with function "<" (Left : Key_Type; Right : Element_Type)
return Boolean is <>;
with function ">" (Left : Key_Type; Right : Element_Type)
return Boolean is <>;
package Generic_Keys is ...;
And I declared the operation like this:
procedure Update_Element
(Container : in out Set;
Position : in Cursor;
Process : not null access
procedure (Element : in out Element_Type));
> > One question I had was whether update element is allowed to change the
> > key (and hence change the relative position of the key's node). In
> > the code fragment above, the node is moved (deleted then immediately
> > re-inserted) when there's been a change in the value of the key, and
> > there's an error only if there's already a key with the new value.
>
> I don't know; Tucker didn't cover that issue. My gut feeling
> is that the way you have it is better, because dropping
> elements on the floor (even with an
> exception) is nasty. If there is an alternative, it would be
> preferred. (Of course, doing that could silently cause the
> routine to be expensive, which is also annoying. But it
> wouldn't be much more expensive than Replace, so it probably is OK.)
Yes, but it's probably still going to beat any kind of normal insertion,
since the node isn't destroyed and the element isn't copied.
Another thing I realized is that if we do raise an exception (because the
new key duplicates an existing key), then the cursor that was passed in
effectively becomes a dangling reference.
When we delete a key thru a cursor, we set the cursor value to No_Element on
return. Another possibility for the declaration of Update_Element, that is
analogous to the Delete operation, is:
procedure Update_Element
(Container : in out Set;
Position : in out Cursor; --NOTE MODE
Process : not null access
procedure (Element : in out Element_Type));
We could define the semantics as follows: if the key hasn't been modified,
then Position retains its value. Otherwise, the node is removed from the
tree and then re-inserted back into the tree. If the insertion was
successful, then Position retains its value. If the insertion was not
successful, then the node is deallocated and Position is set to No_Element.
(Note that there aren't any exceptions.)
It's just an idea, so I figured I'd bring it up. Since Update_Element now
has deletion semantics, maybe that should be more obvious.
****************************************************************
From: Randy Brukardt
Sent: Wednesday, July 7, 2004 7:13 PM
At the recent ARG meeting we discussed the meaning of Swap for Lists. When
working on the minutes of the meeting, I noticed this discussion, and I
think that the resolution of the issue was nonsense (which I failed to
realize at the time). So I'm bringing it up here for further discussion.
---
Here's the meeting minutes on the topic:
Swap for list: After a swap, do the cursors still designate the same
elements, or do they designate the swapped elements? The semantics should
be similar to that for an array: much like an index designate a “box” in the
array, a cursor designates a “box” holding an element, so after a swap, the
elements in the boxes should be changed. So we swap the elements, not the
nodes.
---
There are a couple of problems with this semantics:
* This cannot be implemented without an extra level of indirection for
indefinite containers. While an Ada standard implementation would
necessarily have such a level of indirection, it is easy to imagine an
implementation supporting an extension to Ada that would eliminate that
indirection and the costs thereof. That is, an implementation could support
a single indefinite component at the end of the node record (with the
requirement that such a record is initialized on creation); that would get
rid of extra allocation costs. I don't think we want to prevent such an
implementation solely for Swap.
* This completely eliminates any performance advantage for a special swap
routine for lists. Since the objects will be copied (at least in the
definite case), it would always be better for the programmer to code their
own routine that swapped the nodes.
As such, I don't see the point of defining this routine for Lists. Indeed,
since this routine only provides a performance advantage for indefinite
vectors (in all other cases the performance is the same or worse than
something built out of primitives), I have to wonder if it worth having this
routine at all.
In any case, I think that the thinking about cursors reflected in the
minutes is flawed. A cursor designates an element - how that is accomplished
isn't supposed to be visible in the specification of the containers. We
specifically dropped a lot of wording about "nodes" for this reason - it's
not necessary to reason about the containers.
That said, it's obvious that the Swap for list should just swap the
positions of the elements in the list, and cursors should continue to
designate the same elements. (Logically, what they designate should be
unspecified, but that seems to be going to far for lists.) The vector one is
different, simply because cursors are weird -- we have all of the Bounded
Error rules to explain that they don't necessarily designate the same
element afterwards. Thus, for the vector one, it should be a Bounded Error
to use the cursors afterwards, just like for Insert and Delete. (Indexes
don't have that sort of issue, and usually using indexes would be
preferred.)
Another alternative is to recognize that these operations are fundamentally
different, and thus give them different names. But that seems to be rather
confusing.
All in all, I don't think (and I have *never* thought) that a swap operation
is worth having for any of the containers, because it is just too
complicated to define sensibly.
****************************************************************
From: Nick Roberts
Sent: Wednesday, July 7, 2004 7:31 PM
I have to say, I am annoyed that you didn't follow my suggested model of
internal cursors. If you had done so, you simply wouldn't have these
problems.
****************************************************************
From: Randy Brukardt
Sent: Wednesday, July 7, 2004 8:19 PM
I haven't the foggiest idea what you are talking about; there isn't any
messages from you in Ada Comment with both "internal" and "cursor" that
remotely discuss a "model". (If you want to post a reference, *please* just
give the date of the message, don't quote the whole thing. There is far too
much quoting going on around here, which makes it redundant for later
readers of the mail.)
And, in any case, the model of cursors really has nothing to do with this.
Swap is a stupid operation; it's used mostly in student assignments and
sub-optimal sorting algorithms. I don't think I've used Swap for anything
else in my nearly 30 years of programming -- simply because it always
requires three copies of something, and that's 1/3 too many.
Since the only real use of Swap is in the sorting algorithms, making it
user-visible is stupid. If it makes some people feel good, fine, as long as
it doesn't screw up the model. It isn't worth that.
****************************************************************
From: Nick Roberts
Sent: Thursday, July 8, 2004 7:30 AM
> I haven't the foggiest idea what you are talking about; there isn't
> any messages from you in Ada Comment with both "internal" and
> "cursor" that remotely discuss a "model".
It was something that was lengthily discussed in comp.lang.ada, the ASCL
mailing list, and other places. I'm surprised you don't remember anything
about it, Randy, I thought everyone knew about it. My proposals were
posted on the web for a long time. Swing your browser to:
http://www.adapower.net/ascl/
It's all still there.
> And, in any case, the model of cursors really has nothing to do
> with this.
I think it does. You were arguing that Swap should be dropped for
doubly-linked lists (and maybe also for vectors) because, with the current
cursor model, it would be difficult (impossible?) to implement
the same semantics for both the lists and the vectors. With my model,
there would be no such difficulty.
> Swap is a stupid operation; it's used mostly in student
> assignments and sub-optimal sorting algorithms.
Sub-optimal? Such as Quicksort and Mergesort?
> Since the only real use of Swap is in the sorting algorithms,
> making it user-visible is stupid.
I don't about 'stupid', but less than vitally important perhaps.
> If it makes some people feel good, fine, as long as
> it doesn't screw up the model. It isn't worth that.
Well, I agree, at this stage. I just think it's a pity my model wasn't
used from the outset.
****************************************************************
From: Matthew Heaney
Sent: Thursday, July 8, 2004 9:35 AM
> That said, it's obvious that the Swap for list should just swap the
> positions of the elements in the list, and cursors should continue to
> designate the same elements. (Logically, what they designate should be
> unspecified, but that seems to be going to far for lists.) The vector one is
> different, simply because cursors are weird -- we have all of the Bounded
> Error rules to explain that they don't necessarily designate the same
> element afterwards. Thus, for the vector one, it should be a Bounded Error
> to use the cursors afterwards, just like for Insert and Delete. (Indexes
> don't have that sort of issue, and usually using indexes would be
> preferred.)
So if you were in favor of retaining Swap for lists, then does this mean
that you want the original, pre-palma semantics? (Swap "exchanges
nodes," to use Matt's term.)
****************************************************************
From: Randy Brukardt
Sent: Thursday, July 8, 2004 1:36 PM
Yes. The Palma semantics don't do anything useful, and certainly are not
what you would write yourself if you had a need to swap two nodes in a
hand-programmed list.
****************************************************************
From: Randy Brukardt
Sent: Friday, July 2, 2004 2:28 PM
> It was something that was lengthily discussed in comp.lang.ada, the ASCL
> mailing list, and other places. I'm surprised you don't remember
> anything about it, Randy, I thought everyone knew about it.
Oh, I didn't realize that you were talking about (ancient) pre-history.
Since such things are not recorded with the AI (meaning that its impossible
to go back and look them up), references to them are going to be more
confusing than enlightening. In this case, I thought that you were referring
to some specific suggestion for how the wording should be written for *this*
proposal -- thus the confusion.
In any case, I think virtually everyone here has an idea of how they would
design this library differently if they were doing it. Whether those ideas
are better or not is pretty much irrelevant, as we've decided to use Matt's
basic design - in large part because Matt has done more thinking on this
topic (including usage issues) than nearly anyone else, *and* he submitted a
complete proposal as a starting point. I certainly would not have
implemented cursors as Matt has; but our job at this point is to insure that
the description of those cursors is correct, and that the operation set is
complete and described correctly -- not to gripe about some other design
being better. (We all know that "best is the enemy of good enough", and that
certainly applies here.)
****************************************************************
From: Matthew Heaney
Sent: Friday, July 9, 2004 9:33 AM
Given Randy's analysis, I think the vector and list swap operations look
like this:
For vectors:
procedure Swap (Container : in out Vector;
I, J : in Index_Type'Base);
procedure Swap (I, J : in Cursor);
For lists:
procedure Swap (Container : in out List;
I, J : in Cursor);
The semantics are as follows.
For the vector swap operations, the elements in the vector are swapped.
One question we need to answer is whether cursors in the cursor-based
swap for vectors remain valid. I know that our working model is that
cursors designate elements (rather than positions within the vector), so
if the element moves then the cursor is supposed to follow the element.
(In the reference implementation the cursors do remain valid, since
internally they're just index values. But this breaks the model above,
since I and J continue to designate the same relative positions as
before the swap, and hence I and J deliver different element values
following the swap. Compare this to the list behavior below.)
For the list swap operation, the nodes designated by I and J are
relinked. (I know I'm not supposed to say "nodes" or "relinked," but
just bear with me.) I and J continue to designate the same nodes before
and after the swap (and hence, I and J return the same element values as
existing prior to the swap). Following the swap, I designates the node
in J's former relative position in the list, and J designates the node
in I's former relative position. (These are just the the pre-Palma
semantics.)
Note that the behaviors for vector swap and list swap are different. I
think that's OK, since the reason this operation exists is to allow you
to take advantage of the particular representation of the container, in
a way that allows swap to be more efficient.
In the case of the definite vector, the implementation will most likely
exchange element values by creating at least one temporary. So in this
one case, swap doesn't confer any great advantage besides convenience
(and we need it anyway, since the spec must be identical to the
indefinite vector).
For the indefinite vector, swap really does confer an advantage, since
some form of element indirection is implied, and so the implementor can
swap internal pointers instead of elements. (This was our original
motivation for introducing a swap operation.)
The swap for the definite and indefinite lists work the same, by
relinking internal nodes. (In the pre-Palma reference implementation, I
implemented swap for lists using Splice.)
Note that each of the three operations declared above has a different
signature. That means there's no possibility that switching from a
vector to a list (say) will suddenly introduce different swap semantics
(presumably the application was depending on the original semantics),
since the signatures are different and hence the change will be caught
by the compiler.
(You need to pass the list to the list swap operation, since the list
caches pointers to the first and list nodes, and so if you're moving
nodes then the cache values might change. In the vector case, you're
moving elements not nodes, so you don't need to pass the vector object.)
****************************************************************
From: Matthew Heaney
Sent: Saturday, July 10, 2004 11:44 AM
Now that container types are publicly tagged, it might make sense to declare
some container type parameters as class-wide.
For example, the generic_sort and generic_merge for lists are generic
operations, which means these operations aren't primitive for the list type.
Should they be declared this way:
generic
with function "<" (Left, Right : Element_Type)
return Boolean is <>;
procedure Generic_Sort (Container : in out List'Class);
generic
with function "<" (Left, Right : Element_Type)
return Boolean is <>;
procedure Generic_Merge (Target : in out List'Class;
Source : in out List'Class);
If a user does derive from List (say), then he would have to convert back to
the parent type in order to call an instantiation of generic_sort or
generic_merge, but that's kind of a pain.
In the case of generic_merge, another possibility is to pass the type as a
generic formal:
generic
type List_Type is new List with private;
with function "<" (Left, Right : Element_Type)
return Boolean is <>;
procedure Generic_Merge (Target : in out List_Type;
Source : in out List_Type);
This would force target and source to have the same type. But of course
it's more work to instantiate.
For the sets, there are a couple of predicate functions that accept two set
objects. The declarations would look like:
function Is_Subset (Item : Set;
Container : Set'Class)
return Boolean;
function Is_Disjoint (Item : Set;
Container : Set'Class)
return Boolean;
(Note that we still haven't decided that the parameter names should be --
should "Container" come first?)
The sets package also has the nested generic Generic_Keys. None of its
operations are primitive for type Set either. That might be an argument for
declaring all of the set container parameters as type Set'Class. (Another
possibility is to pass in the set type as a generic formal.)
Note that the sets have the union, intersection, etc operations, and the
lists have the splice operations, but I think those can stay as is.
****************************************************************
From: Nick Roberts
Sent: Monday, August 2, 2004 4:23 PM
Randy's done a great job, as usual, updating this AI.
For most (maybe all) of the changes, I say "Hooray!" As always, I have
some comments. Happily, these are all very minor issues. Just dismiss
any question already answered or issue already dealt with (and accept
my apologies).
* In line 205, swap two items (to accord with the order of presentation
of the four container kinds), from:
The following major non-limited containers are provided:
* (Expandable) Vectors of any non-limited type;
* Doubly-linked Lists of any non-limited type;
* Ordered Sets of any non-limited type;
* Hashed Maps keyed by any non-limited type containing any
non-limited type.
to:
The following major non-limited containers are provided:
* (Expandable) Vectors of any non-limited type;
* Doubly-linked Lists of any non-limited type;
* Hashed Maps keyed by any non-limited type containing any
non-limited type.
* Ordered Sets of any non-limited type;
* Rephrase:
Separate versions for definite element types are provided, as those
can be implemented more efficiently.
as:
Separate versions for definite and indefinite element types are
provided, as those for definite types can be implemented more
efficiently.
* Typo in line 253:
specify precisely where this will happen (it will happen no latter
than the
Change 'latter' to 'later'.
* Add a requirement (or Imp Adv) for Hash_Type'Modulus to be a power
of two? (Line 299.)
* In line 307, clarify what the 'back end' of a vector is :-)
Maybe:
The language-defined package Containers.Vectors provides a private
type Vector and a set of operations. A vector container allows
insertion and deletion at any position, but it is specifically
optimized for insertion and deletion at the back end of the
container (the end with the highest index). A vector container
also provides random access to its elements.
* I suggest to rephrase:
A vector container object manages an unconstrained internal array,
which expands as necessary as items are inserted. The *capacity* of
a vector corresponds to the total length of the internal array, and
the *length* of a vector corresponds to the number of active
elements in the internal array.
A vector container may contain *empty elements*. Empty elements do
not have a specified value.
as:
A vector has a *length*, of the type Containers.Count_Type, which
varies dynamically and is the number of elements the vector
contains. These elements are the *active elements* of the vector.
Their indices are of the subtype Index_Type, and occupy the range
f .. f+n-1, where f is Index_Type'First and n is the length of the
vector.
A vector container object manages an internal array, which expands
as necessary. The *capacity* of a vector corresponds to the number
of elements which can be stored in the internal array, and will
always be no less than the length of the vector.
An active element may be *empty*. An empty element does not have a
specified value, and it is an unbounded error to read an empty
element.
I know this is a little bit more long-winded, but I think it is a bit
clearer.
* Maybe the AARM note at line 329:
The internal array is a virtual structure. There is no requirement
for the implementation to be a single contiguous array.
could be better phrased:
The internal array is a conceptual model for the purposes of
defining the semantics of vectors. There is no requirement for an
implementation to actually use a single contiguous array.
* Maybe add a AARM note:
pragma Assert (Index_Type'Base'First < Index_Type'First);
It is essential that Last_Index is always able to return a valid
value, including for an empty vector. It cannot do this if
Index_Type'Base'First = Index_Type'First, so it is a requirement
that Index_Type'Base'First < Index_Type'First for any instantiation
of Containers.Vectors, and this pragma enforces the requirement.
* What is the purpose of Index_Subtype? (This question seems to have
been raised but not answered.)
* For vectors, lists, maps, and sets the procedures named 'Iteration'
and 'Reverse_Iteration', which were generic but are now procedures
taking an access-to-subprogram parameter, would perhaps now be more
appropriately named 'Iterate' and 'Reverse_Iterate'? (Or 'Traverse'
and 'Reverse_Traverse'? :-)
* Should the same thing (change from generic procedure to procedure
with access-to-subprogram parameter) be done for Generic_Sort? (I
guess the answer is to do with efficiency.)
* Should procedure 'Assign' be called 'Copy', to emphasise its
distinction from normal assignment? Should it have start and end
parameters? (I see idea this already got suggested.)
* The description for Last_Index line 987:
Returns the position of the last element in Container.
could be better expressed as:
If the length of Container is 0, Last_Index returns
First_Index(Container) - 1, otherwise it returns the index of the
last active element in Container.
* Typo in line 1065:
function Reverse_Find (Container : Vector;
Item : Element_Type;
-> Index : Index_Type'Base := Index_Type'Las))
return Index_Type'Base;
The 't' is missed off 'Last' and there's an extra ')'.
* There may be a call ambiguity problem with Find and Reverse_Find.
These are provided for both an index and cursor starting point:
function Find (Container : Vector;
Item : Element_Type;
Index : Index_Type'Base := Index_Type'First)
return Index_Type'Base;
function Find (Container : Vector;
Item : Element_Type;
Position : Cursor := No_Element)
return Cursor;
A call Find(Cont,Item) in a context which could require either an
index or a cursor will be ambiguous. I'm just pointing this one out
at the moment; maybe it's too trivial to worry about.
* In the description for Splice, line 1635:
last node of Container. The length of Target is incremented, and
the length of Source is decremented.
'Container' needs to be changed to 'Target'.
* In line 1788:
AARM Note: The name is "Hashed_Maps" to allow for a secondary
standard to include "Sorted_Maps".
Maybe the suggested name should be "Ordered_Maps", by analogy to the
package "Ordered_Sets"?
* For the map 'Replace' procedure:
procedure Replace (Container : in out Map;
Key : in Key_Type;
New_Item : in Element_Type);
Replace inserts Key and New_Item as per Insert, with the difference that
if Key is already in the map, then this operation assigns New_Item to
the element associated with Key. Any exceptions raised during assignment
are propagated.
I think this procedure should be called 'Replace_or_Insert', to avoid
potential confusion about its semantics to anyone reading code using
it. This point has already been argued about, so maybe it should rest
as is now.
* For the Delete at cursor procedure:
procedure Delete (Container : in out Map;
Position : in out Cursor);
If Position equals No_Element, this operation has no effect.
Otherwise, Delete removes the node from the map and deallocates the
node.
Position is set to No_Element on return.
Is it such a good idea for Position to have mode 'in out'? This
prevents constructions such as (concocting an example):
Delete(M,Next(C));
which might be handy sometimes.
* Minor typo in line 2118:
If Length (Container) > Count, then Constraint_Error is
propogated.
Change 'propogated' to 'propagated'. Also, on the following line,
perhaps the phraseology should be:
Otherwise, Set_Capacity allocates ...
I'd suggest the same for the wording for the following First and Next
functions.
* Is it really appropriate for the Key_Type formal parameter of
generic package Ordered_Sets.Generic_Keys to be limited? I don't think
it matters a great deal, but it just might assist some implementations
by making it non-limited (since it would allow them to make internal
copies). I doubt that a limited type would ever be required in practice:
if the key is directly extracted from an element, it cannot be limited
(because the element cannot); if it is indirectly derived, it must be
via an access value, in which case the access type can be used as the
key type. That Key_Type is limited doesn't seem to match its conceptual
abstraction, to my mind.
* In line 2473 and 2788, within package Generic_Keys:
procedure Update_Element
-> (Position : in Cursor;
Process : not null access procedure (Element : in out
Element_Type));
Would it be better to make the mode of Position 'in out', so that if
re-insertion occurs the cursor 'tracks' the new position?
* In those places where "Index_Type'Succ(X)" is used, couldn't X + 1
be used instead? Index_Type is an integer type (isn't it?).
* In various places appears a phrase like:
Any exceptions raised ... are propagated.
I think these should be rephrased in the singular:
Any exception raised ... is propagated.
since only one exception can be propagated (the first to occur).
* As an example of counting the number of angels dancing on the tip of
a needle, I wonder if the word 'expansile' should be used instead of
'expandable'. I think there is a tiny difference of meaning, in that
the latter suggests something that can be expanded deliberately (like
a set of modular shelves) whereas the former suggests something that
inherently tends to expand (as one's bladder, on a visit to the pub).
*** A few side-notes:
The semantics expressed by:
While Set_Capacity can be used to reduce the capacity of a map, we
do not specify whether an implementation actually supports reduction
of the capacity. Since the actual capacity can be anything greater
than or equal to Count, an implementation never has to reduce the
capacity.
is exactly what I wanted. Cool.
Some of the AARM notes are brilliant. E.g. "The implementation needs
to take care so that aliasing effects do not make the result trash;
Union (S, S); must work." This is totally cool; it's just the kind
of gotcha that causes me to have bad nights.
Finally I'd like to express -- and I very much hope this doesn't sound
trite or ingratiating -- my admiration for Matt and Randy for all the
perspiration and inspiration (in whichever proportion) they have put
into this Amendment. I hope it gets passed.
In fact, I just want to add a note that I hope nothing I have posted to
this mailing list comes across as being deliberately antagonistic or
offensive (towards Randy or anyone else). Although I may often be blunt
in the way I put things, and I may often be strident in my criticisms,
there has never been on my part any element of personal animosity in
any of my comments. On the contrary, I may disagree vehemently with
someone on a particular issue, without having any less respect for them.
I feel that it is in the nature of the best technical forums that the
participants retain a sense of personal respect for the others, however
passionate their disagreement may be. Nuff said.
****************************************************************
From: Matthew Heaney
Sent: Monday, August 9, 2004 4:36 PM
Nick Roberts wrote:
>
> * What is the purpose of Index_Subtype? (This question seems to have
> been raised but not answered.)
It's there for composability. You want to be able to write operations
and declare variables in terms of the index type used to instantiate the
package.
> * For vectors, lists, maps, and sets the procedures named 'Iteration'
> and 'Reverse_Iteration', which were generic but are now procedures
> taking an access-to-subprogram parameter, would perhaps now be more
> appropriately named 'Iterate' and 'Reverse_Iterate'? (Or 'Traverse'
> and 'Reverse_Traverse'? :-)
I have made the same comment in my review of the AI. (The name should
be the verb phrase "Iterate", not the noun phrase "Iteration".)
> * Should the same thing (change from generic procedure to procedure
> with access-to-subprogram parameter) be done for Generic_Sort? (I
> guess the answer is to do with efficiency.)
Yes, it has to do with efficiency. Predefined relational operators for
a type are intrinsic, so you can't take their address and hence cannot
pass the operation as the value of an anonymous access parameter.
> * Should procedure 'Assign' be called 'Copy', to emphasise its
> distinction from normal assignment? Should it have start and end
> parameters? (I see idea this already got suggested.)
No. It's called Assign because it's an assignment operation. (In this
case, more efficient than operator ":=".)
The lexical rule is: Assign takes the Target as the first parameter and
Source as the second parameter, while Copy takes the Source as the first
parameter and target as the second parameter, like this:
procedure Assign
(Target : in out T;
Source : in T);
procedure Copy
(Source : in T;
Target : in out T);
So even if you were to change the name, you'd have to change the
parameter order too. But you have have to put the target as the first
parameter, in order to use distinguished receiver syntax:
Target.Assign (Source);
Is the same as:
Target := Source;
but the former is more efficient than the latter (for a vector).
> * There may be a call ambiguity problem with Find and Reverse_Find.
> These are provided for both an index and cursor starting point:
>
> function Find (Container : Vector;
> Item : Element_Type;
> Index : Index_Type'Base := Index_Type'First)
> return Index_Type'Base;
>
> function Find (Container : Vector;
> Item : Element_Type;
> Position : Cursor := No_Element)
> return Cursor;
>
> A call Find(Cont,Item) in a context which could require either an
> index or a cursor will be ambiguous. I'm just pointing this one out
> at the moment; maybe it's too trivial to worry about.
I discussed this with Randy, and he reasoned that ambiguity wouldn't be
a problem because in the typical case you name the return value anyway.
However, we do have First_Index and Last_Index, and so I would be in
favor of renaming the first operation Find_Index.
> * For the map 'Replace' procedure:
>
> procedure Replace (Container : in out Map;
> Key : in Key_Type;
> New_Item : in Element_Type);
>
> Replace inserts Key and New_Item as per Insert, with the difference that
> if Key is already in the map, then this operation assigns New_Item to
> the element associated with Key. Any exceptions raised during assignment
> are propagated.
>
> I think this procedure should be called 'Replace_or_Insert', to avoid
> potential confusion about its semantics to anyone reading code using
> it. This point has already been argued about, so maybe it should rest
> as is now.
Too much verbosity. Let's just call it "Insert".
> * For the Delete at cursor procedure:
>
> procedure Delete (Container : in out Map;
> Position : in out Cursor);
>
> If Position equals No_Element, this operation has no effect.
> Otherwise, Delete removes the node from the map and deallocates the
> node.
> Position is set to No_Element on return.
>
> Is it such a good idea for Position to have mode 'in out'? This
> prevents constructions such as (concocting an example):
>
> Delete(M,Next(C));
>
> which might be handy sometimes.
But you're deallocating the node designated by Position, so you have to
set the cursor to No_Element.
I haven't had a need to do as in your example, but I have had a need to
iterate through the map and delete each element in turn. I can do this
in C++ by saying
my_map.erase(my_iter++);
but I don't have that option in Ada. But it's no big deal, just do this:
declare
I : Cursor := First (M);
J : Cursor;
begin
while Has_Element (I) loop
Update_Element (I, Finalize'Access);
J := I;
Next (I);
M.Delete (J);
end loop;
end;
> * Is it really appropriate for the Key_Type formal parameter of
> generic package Ordered_Sets.Generic_Keys to be limited? I don't think
> it matters a great deal, but it just might assist some implementations
> by making it non-limited (since it would allow them to make internal
> copies).
But you can make an internal copy (in fact, the reference implementation
does), you just have to use renames instead of assignment:
declare
Copy_Of_Key : Key_Type renames Key (E);
begin
> I doubt that a limited type would ever be required in practice:
> if the key is directly extracted from an element, it cannot be limited
> (because the element cannot); if it is indirectly derived, it must be
> via an access value, in which case the access type can be used as the
> key type. That Key_Type is limited doesn't seem to match its conceptual
> abstraction, to my mind.
But Generic_Keys is written in terms of what Generic_Keys requires. It
only needs function Key to implement Update_Element, and it doesn't need
assignment (since it can use renames).
> * In line 2473 and 2788, within package Generic_Keys:
>
> procedure Update_Element
> -> (Position : in Cursor;
> Process : not null access procedure (Element : in out
> Element_Type));
>
> Would it be better to make the mode of Position 'in out', so that if
> re-insertion occurs the cursor 'tracks' the new position?
Hmmmm. Not sure what you mean here, since Position would seem to
already "track" the new position. The cursor continues to designate the
same internal storage node before and after the call (even if the
relative position of the node changes), and so it doesn't need to be inout.
The more substantive issue in my mind is what happens if the key value
changes and it matches another key in the set. In that case we
deallocate the storage node and then raise Constraint_Error.
I don't really like that, since the cursor is now designating a node
which has been deallocated. I'd rather say:
procedure Update_Element_Or_Delete
(Position : in out Cursor;
Process : ...);
and then not raise any exception if there's a match. Rather, we
deallocate the node and then set Position to No_Element.
****************************************************************
From: Matthew Heaney
Sent: Thursday, July 12, 2004 6:42 PM
One of the things we did in Palma was to move the Update_Element operation
for ordered sets into its nested package Generic_Keys, to allow it to check
whether the key was modified.
One of the consequences of moving that operation is that we no longer have
an operation to pass the set element as a parameter (without also
instantiating the nested generic). For example:
procedure Print (S : Set) is
procedure Print (E : in out ET) is
begin
-- we only need to query E, not modify it
end;
procedure Process (C : Cursor) is
begin
Update_Element (C, Print'Access);
end;
begin
S.Iterate (Process'Access);
end Print;
We can do this if only we also instantiate the nested generic. Of course,
we could also use the selector function Element for cursors, to return a
copy of the element, but this isn't very attractive for large elements.
I think what's missing is an operation like Update_Element, but with the
difference that the access procedure parameter accepts the element with in
mode. Something like:
procedure Query_Element
(Position : in Cursor;
Process : not null access procedure (E : in ET));
This would be declared outside of the Generic_Keys nested package. In fact,
I think it makes sense to add this operation for all containers.
This would allow us to write the example above like this:
procedure Print (S : Set) is
procedure Print (E : in ET) is
begin
-- we only need to query E, not modify it
end;
procedure Process (C : Cursor) is
begin
Query_Element (C, Print'Access);
end;
begin
S.Iterate (Process'Access);
end Print;
Also, in Palma we added a Key selector function to the generic formal region
of Generic_Keys. Now that we have that, I think it makes sense for
Generic_Keys to provide the following additional operation:
function Key (Position : Cursor) return Key_Type;
This allows Generic_Keys to more closely mimic the spec of the (heahed) map.
****************************************************************
From: Matthew Heaney
Sent: Thursday, September 9, 2004 9:14 PM
> -----Original Message-----
> From: Randy Brukardt [mailto:randy@rrsoftware.com]
> Sent: Tuesday, September 07, 2004 1:13 PM
>
> Of course, we don't have the Hashed_Set and Ordered_Map
> containers, and they will come up in practice. I'm now
> convinced that their omission was a mistake. I recently had a
> case where I could have used a map, but creating a decent
> hash function was going to be substantial work not justified
> by the number of elements to be used. Clearly, an Ordered_Map
> would be much easier to use in such a case. The examples
> given by Tucker are good examples of the use of a Hashed_Set.
As a sort of thought experiment, I implemented a hashed set. It's up at the
CVS repository.
<http://charles.tigris.org/source/browse/charles/src/ai302/a-cohase.ads?rev=
HEAD&sortby=date&content-type=text/vnd.viewcvs-markup>
<http://charles.tigris.org/source/browse/charles/src/ai302/a-cohase.ads?sort
by=date#dirlist>
<http://charles.tigris.org/source/browse/charles/src/ai302/?sortby=date#dirl
ist>
<http://charles.tigris.org/>
It looks just like the ordered set, except for the generic formal region,
and the addition of Set_Capactity, etc.
****************************************************************
From: Randy Brukardt
Sent: Thursday, September 9, 2004 9:56 PM
The problem, of course, is that the description of the operations in for
the standard would be different, and its very late to be adding anything
significant.
I would hope that most implementers would provide a Hashed_Set patterned on
your package. (This is the sort of thing that an IWA would be very good for.)
****************************************************************
From: Matthew Heaney
Sent: Friday, September 11, 2004 3:09 AM
I just uploaded an ordered map.
http://charles.tigris.org/source/browse/charles/src/ai302/a-coorma.ads?rev=H
EAD&sortby=date&only_with_tag=HEAD&content-type=text/vnd.viewcvs-markup
http://charles.tigris.org/source/browse/charles/src/ai302/a-coorma.ads
****************************************************************
From: Nick Roberts
Sent: Sunday, September 12, 2004 10:18 AM
I think this proposal is looking pretty polished now.
I hope it won't seem wrong for me to put my own responses to Randy's
questions (to the ARG, I guess) at the end of the latest update to this AI.
Apologies if the quoting seems heavy.
> Q1) Find_Index returns Last_Index (Container) + 1 if the element is not
> found. This seems consistent to me (it's past the end of the container in
> a forward search), but Matt worries that First_Index (Container) - 1
> might be thought of as better. The trouble with First_Index (Container) -
> 1 is that you can't put it into an object:
> declare
> I : Index_Type := Index_Type'First;
> begin
> I := Find_Index (Vect, Item, I);
> while I <= Last_Index (Vect) loop
> -- Do something to the element I.
> I := Find_Index (Vect, Item, I+1);
> end loop;
> end;
>If Find_Index returned Index_Type'First - 1, saving the result of
> Find_Index would raise Constraint_Error if the item is not found. That's
> not what we want, I think.
The problem with Last_Index (Container) + 1 is that it may not exist,
because Last_Index (Container) might be Index_Type'Last. On the other hand,
we convenienty have a requirement that Index_Type'First >
Index_Type'Base'First, which guarantees that First_Index (Container) - 1
does always exist (as a value of Index_Type'Base).
I think that swings it. I suggest Find_Index returns First_Index
(Container) - 1 when it does not find what it is looking for.
It might be convenient to declare two extra things in the
Containers.Vectors package:
subtype Find_Index_Result is
Index_Type'Base range Index_Type'First-1 .. Index_Type'Last;
Not_Found: constant Find_Index_Result := Find_Index_Result'First;
Obviously the Find_[Reverse_]Index functions can then have
Find_Index_Result as their return types.
The example can then be reformulated as:
declare
I : Find_Index_Result := Index_Subtype'First;
begin
I := Find_Index (Vect, Item, I);
while I /= Not_Found loop
-- Do something to the element I.
I := Find_Index (Vect, Item, I+1);
end loop;
end;
or alternatively:
declare
I : Find_Index_Result := Index_Subtype'First - 1;
begin
loop
I := Find_Index (Vect, Item, I+1);
exit when I = Not_Found;
-- Do something to the element I.
end loop;
end;
> Q2) The parameters to Generic_Merge have not been made class-wide (even
> though the comments about non-primitive operations with specific tagged
> parameters mentioned for Generic_Sort hold here, too). That's because
> both parameters need to be the same type. An alternative would be to make
> them class-wide, and then have a runtime check (of the actual tags) that
> they actually are the same type. But that is not very O-O. A third
> possibility would be to repeat the type in the generic spec:
> generic
> type List_Type is new List with private;
> with function "<" (Left, Right : Element_Type)
> return Boolean is <>;
> procedure Generic_Merge (Target : in out List_Type;
> Source : in out List_Type);
>But that is not very consistent with the rest of the specification. Some
> guidance would be helpful here.
I'm uncomfortable with Generic_Sort having a classwide parameter:
generic
with function "<" (Left, Right : Element_Type) return Boolean is <>;
procedure Generic_Sort (Container : in out List'Class);
This is because the actual sorting operation can only be on objects of the
root type, both conceptually and actually. The implementation would have to
typecast the parameter Container to List anyway.
I think it would make much more sense for Generic_Sort to be declared:
generic
with function "<" (Left, Right : Element_Type) return Boolean is <>;
procedure Generic_Sort (Container : in out List);
and the typecast to be done in the call instead. For example:
package Float_Lists is new Ada.Containers.Doubly_Linked_Lists(Float);
procedure Sort is new Float_Lists.Generic_Sort;
type My_List is new Float_Lists.List with ...;
...
L1: My_List;
...
Sort( Float_Lists.List(L1) );
I prefer this because it makes it explicit that the list L1 is being
sorted /as/ a Float_Lists.List (not as a My_List, as such).
To extend the example a bit further:
procedure Merge is new Float_Lists.Generic_Merge;
...
type Freds_List is new Float_Lists.List with ...;
...
L2: Freds_List;
...
Merge( Float_Lists.List(L2), Float_Lists.List(L1) );
I think this is the right formulation, since again it makes it explicit
that you are merging the two lists /as/ Float_Lists.Lists (which is what
makes them compatible enough to merge anyway).
As is rightly pointed out, the implementations of Generic_Sort and
Generic_Merge cannot use dispatching operations of their parameters
(lists), so they must both be specific to their root types anyway. For this
reason, I would suggest neither should have a classwide parameter.
I hope I haven't totally missed the point here!
Note also that there is a typo (lines 1418-1419): in the package
specification itself the parameters to Generic_Merge are both still cited
as List'Class.
> Q6) Tucker has mentioned that he often has components in the key of a map
> beyond the actual key participating ones. (This is similar to the
> behavior of a set; if we had a Hashed_Set this probably would be less of
> an issue.) For that to be effective, it would be necessary to change a
> key that is already in a map. Currently, neither Replace_Element nor
> Insert_or_Replace change the value of a key that is in the map; only the
> element is changed.
>In order to get the sort of semantics that Tucker seems to be suggesting,
> we'd need a way to change the value of a key. But such an operation would
> potentially change the location of the element, so it could be fairly
> expensive. Moreover, it would likely require allocation even if the hash
> didn't change for the indefinite form of the container.
>Finally, whether or not the key is replaced would seem to be another
> (orthogonal) option for the Insert routine "6) Insert replaces the key
> and the element when the key is already in the map; 7) Insert replaces
> the key, leaving the element unchanged when the key is already in the
> map".
>This complication doesn't seem worth it to me, but as it came up very
> late, the entire ARG needs to discuss the issue.
I think this is a case of "Don't do that!"
Isn't the basic idea that key values are there specifically to provide fast
indexed access to the elements? I don't think they are not intended to be
used to carry ancillary information.
It may sometimes be convenient to use an existing type (which has
non-participating components) as a key type, but I think, in these cases,
the non-participating components of the key should be moved into the
element. If you have type T1 (with non-participating components) you want
to use as a key for type T2, I think the proper design is to declare a new
type T3, with only participating components, and a new type T4 which has
the remaining components of type T1 and all those of T2 (or alternatively
two components, of type T1 and T2). You then use T3 to index T4 (from which
you extract the components of T1 or T2 as required).
So, I agree that a key replacement operation should not be added.
> Q3) The generic formal part for maps has:
> with function "=" (Left, Right : Key_Type)
> return Boolean is <>;
> with function Is_Equal_Key (Left, Right : Key_Type)
> return Boolean is "=";
>Matt wonders why both operations are needed; [etc.]
I hate this difference. I think it is counter-intuitive that keys have two
different kinds of equality, and the reason why relates to my answer to Q6:
I think the principle we should stick to is that the purpose of key values
is solely to provide fast indexed access to a set of element values. On
this priciple, I think it is intuitively the case that the equality that is
implied by the ordering operation on the keys:
(not A<B) and (not B<A) |- A=B
is the one and only equality of keys. (I use |- for semantic entailment.)
I therefore favour dropping one of the equality operations.
I'd like one equality operation, to be named "=", having the role currently
fulfilled by Is_Equal_Key, and also used for the map equality test.
To obtain the functionality required in the example -- that two maps,
identical except for one entry which has key="46" in one map and key="0046"
in the other (but "46"="0046" according to the generic "=" function), are
considered unequal -- one could include a duplicate of the key as a
component of the element type.
> Q4) Set_Capacity is defined to raise Constraint_Error if
> Length (Container) > Count. Matt would prefer that this case is not
> separately handled. He would like
> Set_Capacity (M, 0)
>to be a shorthand for setting the Map or Vector to the smallest
> reasonable size. (I find this a bit odd, as Matt never wanted this
> routine to even allow smaller values. But whatever.) Note that just
> dropping the check would not be enough; we'd have to redo the description
> of the operation to say that the capacity is set to at least
> Count_Type'Max (Count, Length(Container)) -- because we don't want this
> operation to drop elements. I'm unsure that the benefit is worth the
> change, and it seems like a bug to me to try to set the capacity of a
> container to be smaller than the number of elements it holds.
The suggested semantics could only be a convenience. The suggested example
could always be replaced by:
Set_Capacity( M, Length(M) );
I don't think Set_Capacity is going to be used very often, in practice. I
believe that on the (relatively few) occasions when a reviewer or
maintainer saw:
Set_Capacity (Widgets, 0);
after actions which clearly caused Widgets to contain something, there
would be a significant risk of confusion and misunderstanding. This code
might be mistakenly removed, on the assumption it was faulty.
I think this is a case where readability is more important than conciseness
of code, and I think using an explicit length will usually be more
readable. At least, I think its meaning is a bit more intuitively obvious.
~~~
Q5 was about using an extra parameter of an enumerated type in insertion
and replacement procedures to indicate what should happen if the key exists
(insertion) or doesn't exist (replacement), with a default indicating that
an exception should be raised.
I rather prefer this idea. I think it is the least of three evils (no user
choice, a plethora of procedures, or control-coupling).
~~~
Another question that arises in my mind is, for any of the functions which
returns a container: what is the capacity of the result? In particular,
should this be defined by the standard, or remain implementation defined?
[Call this Q7?]
****************************************************************
From: Matthew Heaney
Sent: Monday, September 13, 2004 12:56 AM
Each comment immediately follows the text to which it refers, and is
bracketed with "MJH:" and "ENDMJH." pairs.
Some of these items are already on the agenda for Madison, so for those
items I'll just summarize the issue.
Doubly-linked lists:
Empty_List : constant List;
--NOTE: function or deferred constant?
MJH:
We can get rid of this note in the spec, since we now know that Empty_List
is a deferred constant.
ENDMJH.
generic
with function "<" (Left, Right : Element_Type)
return Boolean is <>;
procedure Generic_Merge (Target : in out List'Class;
Source : in out List'Class);
MJH:
We need to resolve the declaration of the parameter types in Madison. We
can either declare the two types as class-wide (as above), or import the
list as a generic formal type, like this:
generic
type List_Type is new List with private;
with function "<" (Left, Right : Element_Type)
return Boolean is <>;
procedure Generic_Merge (Target : in out List_Type;
Source : in out List_Type);
Note also that the description of this operation uses type List (probably
just a typo).
Other possibilities for List_Type are:
type List_Type (<>) is new List with private;
or maybe:
type List_Type (<>) is abstract new List with private;
ENDMJH.
procedure Insert (Container : in out List;
Before : in Cursor;
New_Item : in Element_Type;
Count : in Count_Type := 1);
Insert allocates Count new nodes whose element is initialized to the value
New_Item, and inserts them prior to the node designated by Before. If
Before equals No_Element, the new nodes are inserted immediately
following the last node (if any). Any exception raised during allocation of
internal storage is propagated, and Container is not modified.
MJH:
We have to decide whether partial success is allowed. (The last sentence
above looks vestigial, and might have been written prior to the introduction
of the Count parameter.)
For example, if Count is 10, and we're only able to allocate, say, 7 nodes,
then can the list be modified such that its length only grew by 7 nodes?
ENDMJH.
procedure Swap (Container : in out List;
I, J : in Cursor);
Swap exchanges the nodes designated by I and J.
AARM Note: Unlike Swap_Elements for vectors, this exchanges the nodes, not
the
elements. No copying is performed. I and J designate the same elements after
this call as they did before it. This is important, as this operation is
provided as it can provide better performance than a straight copying swap.
The
programmer can writing a copying swap if they need one. This difference in
semantics is the reason that this operations have different names in the
List
and Vector containers.
MJH:
The penultimate sentence should say:
"The programmer can write a copying swap if he needs one."
We also need to specify the behavior when one or both of the parameters
equal No_Element.
It probably wouldn't hurt anything to add a Swap_Elements operation, too.
If an implementor uses pointers to elements for the indefinite form, than at
least that would confer a performance benefit (the same as for the
indefinite vector).
ENDMJH.
Hashed maps:
generic
type Key_Type is private;
type Element_Type is private;
with function Hash (Key : Key_Type) return Hash_Type;
with function "=" (Left, Right : Key_Type)
return Boolean is <>;
with function Is_Equal_Key (Left, Right : Key_Type)
return Boolean is "=";
with function "=" (Left, Right : Element_Type)
return Boolean is <>;
package Ada.Containers.Hashed_Maps is
MJH:
We need to resolve the characteristics of this generic formal region in
Madison.
Note that in Palma we decided to use the name "Equivalent" instead of
"Is_Equal_Key".
ENDMJH.
procedure Query_Element
(Position : in Cursor;
Process : not null access procedure (Element : in Element_Type));
procedure Update_Element
(Position : in Cursor;
Process : not null access procedure (Element : in out Element_Type));
MJH:
We might need some other operations, if we're serious about manipulating
(and possibly modifying) keys. Here are some ideas:
procedure Query_Key
(Position : in Cursor;
Process : not null access procedure (Key : in Key_Type));
procedure Query_Key_And_Element
(Position : in Cursor;
Process : not null access procedure (Key : in Key_Type;
Element : in Element_Type));
procedure Query_Key_And_Update_Element
(Position : in Cursor;
Process : not null access procedure (Key : in Key_Type;
Element : in out Element_Type));
procedure Checked_Update_Key
(Container : in out Map;
Position : in Cursor;
Process : not null access procedure (Key : in out Key_Type));
ENDMJH.
procedure Insert_Or_Replace (Container : in out Map;
Key : in Key_Type;
New_Item : in Element_Type);
MJH:
Tucker thought that this operation replaced the value of the key too, so we
need to confirm the exact semantics of this operation.
ENDMJH.
function Is_Equal_Key (Left, Right : Cursor)
return Boolean;
function Is_Equal_Key (Left : Cursor;
Right : Key_Type)
return Boolean;
function Is_Equal_Key (Left : Key_Type;
Right : Cursor)
return Boolean;
MJH:
We need to decide about these ops in Madison: either change the name, or
keep the name, or get rid of them.
ENDMJH.
procedure Set_Capacity (Container : in out Map;
Capacity : in Count_Type);
If Length (Container) > Capacity, then Constraint_Error is propagated.
Otherwise, Set_Capacity allocates a new hash table such that the length of
the
resulting map can become at least the value Capacity without requiring an
additional Set_Capacity operation. If the allocation fails, the exception is
propagated and Container is not modified. It then rehashes the nodes in
Container onto the new hash table. It replaces the old hash table with the
new
hash table, and then deallocates the old hash table.
MJH:
(This comment applies to vector, too.)
We have already discussed the fact that I don't think raising CE is
appropriate for this operation.
The purpose of an exception is to indicate that the postcondition cannot be
satisfied. If for example, a very large value for Capacity were requested,
and the implemention were unable to allocate the requisite storage, then an
exception would be appropriate.
However, the postcondition here is "Capacity(Container) >= Capacity", and an
invariant is "Capacity(Container) >= Length(Container)". If Capacity <
Length(Container), then the implementation will allocate a capacity that is
at least the container's length, thus satisfying both the postcondition and
the invariant. Therefore an exception is not appropriate.
Note that the STL member function reserve() (which is equivalent to
Set_Capacity) does *not* raise an exception.
ENDMJH.
Ordered Sets:
function Is_Disjoint (Left, Right : Set) return Boolean;
MJH:
I had originally chosen the name Is_Disjoint to be consistent with Is_In.
This was also the name favored by John Barnes. However, Is_In has since
been renamed to Contains. Should we consider a similar name change for
Is_Disjoint?
function Overlaps (Left, Right : Set) return Boolean;
This is the name Tucker prefers.
ENDMJH.
generic
type Key_Type (<>) is limited private;
with function Key (Element : Element_Type) return Key_Type;
with function "<" (Left : Key_Type; Right : Element_Type)
return Boolean is <>;
with function ">" (Left : Key_Type; Right : Element_Type)
return Boolean is <>;
package Generic_Keys is
MJH:
We have to make a decision about whether we want to pass the set type as a
generic actual type, like this:
generic
type Set_Type is new Set with private;
type Key_Type (<>) is limited private;
..
and then use Set_Type everywhere for operations. Or instead declare the
container parameters as type Set'Class.
Actually, another declaration might be:
type Set_Type (<>) is new Set with private;
or possibly:
type Set_Type (<>) is abstract new Set with private;
(However we declare Set_Type, we should do the same for List_Type of
Generic_Merge.)
ENDMJH.
procedure Checked_Update_Element
(Container : in Set;
Position : in Cursor;
Process : not null access procedure (Element : in out
Element_Type));
MJH:
The Container param should be inout, not in. (I think you already fixed
this.)
ENDJH.
!examples
Ordered Sets:
Another technique would be to use an active iterator, like this:
procedure Shutdown_Connections is
I : Cursor;
X : Connection_Access;
begin
while not Is_Empty (Connection_Set) loop
I := First (Connect_Set);
X := Element (I);
Delete (Connection_Set, Position => I);
Free (X);
end loop;
end Shutdown_Connections;
Here we use the cursor-form of Delete. This is probably more efficient
than using the item-form of Delete, since the cursor-form doesn't have
to search for the item.
MJH:
We might also want to say that this example can be simplified as follows:
START EXAMPLE TEXT
The example can be simplified by using the set operations that
manipulate the first element specifically:
procedure Shutdown_Connections is
X : Connection_Access;
begin
while not Is_Empty (Connection_Set) loop
X := First_Element (Connect_Set);
Delete_First (Connection_Set);
Free (X);
end loop;
end Shutdown_Connections;
END EXAMPLE TEXT
ENDMJH.
To actually change the employee's address in the example above, we use
the special element modifier operation:
procedure Change_Address
(SSN : SSN_Type;
New_Home : Home_Address_Type) is
procedure Set_Home
(Employee : in out Employee_Type) is
begin
Employee.Home := New_Home;
end;
Position : Cursor := Find (Employees, Key => SSN);
begin
if Has_Element (Position) then
SSN_Keys.Checked_Update_Element
(Position => Position,
Process => Set_Home'Address);
...
end if;
end Change_Address;
MJH:
The call to Checked_Update_Element needs to pass the set, too:
SSN_Keys.Checked_Update_Element
(Container => Employees,
Position => Position,
Process => Set_Home'Address);
ENDMJH.
Another technique is to use Checked_Update_Element, which allows the
element's key to be modified, and then moves then element to its new
relative location in the set:
procedure Change_SSN
(Old_SSN : SSN_Type;
New_SSN : SSN_Type) is
Old_Position, New_Position : Cursor;
Inserted : Boolean;
begin
if New_SSN = Old_SSN then
return;
end if;
Old_Position := Find (Employees, Key => Old_SSN);
if not Has_Element (Old_Position) then
return;
end if;
New_Position := Find (Employees, Key => New_SSN);
if Has_Element (New_Position) then
raise Duplicate_SSN;
end if;
declare
procedure Set_SSN (Employee : in out Employee_Type) is
begin
Employee.SSN := New_SSN;
end;
begin
SSN_Keys.Checked_Update_Element
(Position => Old_Position,
Process => Set_SSN'Access);
end;
end Change_SSN;
MJH:
Same thing here: we need to pass the set as a parameter:
SSN_Keys.Checked_Update_Element
(Container => Employees,
Position => Old_Position,
Process => Set_SSN'Access);
ENDMJH.
Suppose now we want a list all the employees in the firm. One way to do
it is like this:
procedure Display is
procedure Print (I : in Employee_Sets.Cursor) is
procedure Do_Print (E : in out Employee_Type) is
begin
Put ("Name: "); Put (E.Name);
Put ("SSN: "); Put (E.SSN);
...;
end;
begin
Query_Element (Position => I, Process => Do_Print'Access);
end;
begin
Iterate (Employees, Print'Access);
end;
MJH:
The mode for the (generic) actual Do_Print should be just in, not inout.
ENDMJH.
begin
Sort (Cursors);
for Index in Cursors'Range loop
C := Cursors (Index);
Query_Element (Position => C, Process => Do_Print'Access);
end loop;
end Display_Employees_In_Name_Order;
MJH:
All of the process procedures above need to change the mode from in to
inout.
ENDMJH.
This lets us perform session lookups based on the session identifier:
procedure Play
(Session_Id : in String;
NPT_Range : in NPT_Range_Type;
RTSP_Status : out RTSP_Status_Type) is
Position : constant Session_Set_Types.Cursor :=
Find (Session_Set, Key => Session_Id);
MJH:
We might want to use name qualification here:
Position : constant Session_Set_Types.Cursor :=
Id_Keys.Find (Session_Set, Key => Session_Id);
ENDMJH.
****************************************************************
From: Matthew Heaney
Sent: Tuesday, September 14, 2004 2:13 AM
My comments are inline. --MJH
> > Q1) Find_Index returns Last_Index (Container) + 1 if the element is
> >not found. This seems consistent to me (it's past the end of the
> >container in a forward search), but Matt worries that First_Index
> >(Container) - 1 might be thought of as better. The trouble with
> >First_Index (Container) - 1 is that you can't put it into an object:
...
> The problem with Last_Index (Container) + 1 is that it may
> not exist, because Last_Index (Container) might be
> Index_Type'Last.
But Find_Index returns Index_Type'Base. There's an implied requirement that
Last_Index(V) < Index_Type'Base'Last, so it doesn't matther if Last_Index(V)
= Index_Type'Last.
Remember the purpose of a vector is to expand. That's what influenced our
decision to use an integer type as the index instead of any discrete type.
Your argument that Last_Index(V) might equal Index_Type'Last is tantamount
to saying that an enumeration type should be able to be used as the index
type, but this matter has already been debated and settled.
Allowing an Index_Type to be passed as a generic formal is really only
intended to allow the user to specify the starting point of the index
subtype range. The ending point should be large relative to the number of
elements typically stored in the container.
I would expect that most users would either use subtypes Natural or Positive
as the generic actual index type. If the starting point of your range has
some negative values, then you could do something like:
type Index_Type is range -42 .. Integer'Pos (Integer'Last);
or
type Index_Type is range -2001 .. Count_Type'Pos (Count_Type'Last);
> On the other hand, we convenienty have a
> requirement that Index_Type'First > Index_Type'Base'First,
> which guarantees that First_Index (Container) - 1 does always
> exist (as a value of Index_Type'Base).
>
> I think that swings it. I suggest Find_Index returns First_Index
> (Container) - 1 when it does not find what it is looking for.
I find Randy's argument more persuasive, and hence agree with him that we
don't need any change here.
We need to affirm the semantics of Reverse_Find_Index too, since you could
make the argument that for symmetry with Find_Index, it should return
Index_Type'Pred (Index_Type'First). On that other hand, you could argue
that for consistency with Find_Index that it should return the same value.
> > Q2) The parameters to Generic_Merge have not been made class-wide
> >(even though the comments about non-primitive operations
> with specific
> >tagged parameters mentioned for Generic_Sort hold here,
> too). That's
> >because both parameters need to be the same type. An
> alternative would
> >be to make them class-wide, and then have a runtime check (of the
> >actual tags) that they actually are the same type. But that is not
> >very O-O. A third possibility would be to repeat the type
> in the generic spec:
> > generic
> > type List_Type is new List with private;
> > with function "<" (Left, Right : Element_Type)
> > return Boolean is <>;
> > procedure Generic_Merge (Target : in out List_Type;
> > Source : in out List_Type);
> >But that is not very consistent with the rest of the specification.
> >Some guidance would be helpful here.
The generic formal type should probably be declared as:
type List_Type (<>) is new List with private;
or possibly
type List_Type (<>) is abstract new List with private;
> I'm uncomfortable with Generic_Sort having a classwide parameter:
...
> I prefer this because it makes it explicit that the list L1
> is being sorted /as/ a Float_Lists.List (not as a My_List, as such).
But this argument applies to any operation. We shouldn't have to call sort
differently, just because it happens to be generic.
You could pass the list type as a generic actual, but that seems like
overkill when the operation only has a single parameter.
We also need to affirm the declaration of Ordered_Sets.Generic_Keys.
> > Q6) Tucker has mentioned that he often has components in the key of a
> >map beyond the actual key participating ones. (This is similar to the
...
> I think this is a case of "Don't do that!"
>
> Isn't the basic idea that key values are there specifically
> to provide fast indexed access to the elements? I don't think
> they are not intended to be used to carry ancillary information.
Well, that's the debate. We can use either model, but we have to chose one.
Essentially, the API as it stands now has not been designed to facilitate
replacement of key values. The problem is that to support key modification,
there will most likely be some kind of penality, either static (more complex
interface) or dynamic (less efficient execution). The decision is whether
the added generality is worth the penalty.
> > Q4) Set_Capacity is defined to raise Constraint_Error if Length
> >(Container) > Count. Matt would prefer that this case is not
> >separately handled. He would like
> > Set_Capacity (M, 0)
> >to be a shorthand for setting the Map or Vector to the smallest
> >reasonable size. (I find this a bit odd, as Matt never wanted this
> >routine to even allow smaller values. But whatever.) Note that just
> >dropping the check would not be enough; we'd have to redo the
> >description of the operation to say that the capacity is set to at
> >least Count_Type'Max (Count, Length(Container)) -- because we don't
> >want this operation to drop elements. I'm unsure that the benefit is
> >worth the change, and it seems like a bug to me to try to set the
> >capacity of a container to be smaller than the number of elements it
> >holds.
>
> I think this is a case where readability is more important
> than conciseness of code, and I think using an explicit
> length will usually be more readable. At least, I think its
> meaning is a bit more intuitively obvious.
The meaning of an operation is its postcondition. (See Meyer's description
of Hoare triples in OOSC, read David Gries' book, and read EWD's books and
technical notes. For more information about the proper use of exceptions,
see the paper at Barne Stroustrup's home page, or just read Appendix E of
TC++PL.)
The postcondition of Set_Capacity is:
Capacity(V) >= Capacity
A vector (or hashed map) also satisfies the invariant that:
Capacity(V) >= Length(V)
Set_Capacity does whatever is necessary to satisfy both of these predicates.
Whether the requested capacity is less than the current length is
irrelevant, since the representation invariant already handles that case.
An exception would only be proper if Set_Capacity were unable to satisfy the
postcondition or the invariant.
> ~~~
>
> Q5 was about using an extra parameter of an enumerated type
> in insertion and replacement procedures to indicate what
> should happen if the key exists
> (insertion) or doesn't exist (replacement), with a default
> indicating that an exception should be raised.
>
> I rather prefer this idea. I think it is the least of three
> evils (no user choice, a plethora of procedures, or control-coupling).
I have already stated that I am not in favor of this change. (It's at the
wrong level of abstraction, for one thing.)
> ~~~
>
> Another question that arises in my mind is, for any of the
> functions which returns a container: what is the capacity of
> the result? In particular, should this be defined by the
> standard, or remain implementation defined? [Call this Q7?]
All this standard says is that Capacity(C) >= Length(C). If you want more
control of the capacity, then use the procedures, not the functions.
****************************************************************
From: Nick Roberts
Sent: Tuesday, September 14, 2004 10:06 AM
Matthew Heaney wrote:
>>The problem with Last_Index (Container) + 1 is that it may
>>not exist, because Last_Index (Container) might be
>>Index_Type'Last.
>
> But Find_Index returns Index_Type'Base. There's an implied requirement that
> Last_Index(V) < Index_Type'Base'Last, so it doesn't matther if Last_Index(V)
> = Index_Type'Last.
If there is such a requirement, it should be explicit. But I would be
alarmed if such a requirement really was imposed: it would be a classic
potential source of obscure bugs.
> Remember the purpose of a vector is to expand. That's what influenced our
> decision to use an integer type as the index instead of any discrete type.
> Your argument that Last_Index(V) might equal Index_Type'Last is tantamount
> to saying that an enumeration type should be able to be used as the index
> type, but this matter has already been debated and settled.
I don't think so at all (that Last_Index(V) might equal Index_Type'Last is
tantamount to saying that an enumeration type should be able to be used
as the index type). On an implementation that has an 16-bit Integer'Base
type, and where V is of a package instantiated with Index_Type Positive, it
is quite feasible that Last_Index(V) becomes 32767.
> Allowing an Index_Type to be passed as a generic formal is really only
> intended to allow the user to specify the starting point of the index
> subtype range. The ending point should be large relative to the number of
> elements typically stored in the container.
Why? Why should it (Index_Type'Last) not be equal to the maximum number of
elements to be stored? It would be prudent to be sure that the maximum was
never exceeded, in such a case.
Let me give you an example. Supposing we are programming a card game, and
we need a container that can contain a 'hand', which is a list of cards.
The rules of this game do not limit the number of cards in a hand, as such,
but since the cards come from a pack of 52, we can be absolutely certain
that the limit of 52 will never be exceeded. It would therefore be quite
reasonable (wouldn't it?) to write:
type Card_Count is range 0..52;
subtype Card_Index is Card_in_Hand_Count range 1..52;
package Hand_Vectors is
new Ada.Containers.Vectors (Card_ID, Card_Index);
...
Dealer: Hand_Vectors.Vector := Random_52;
Hands: array (Player_ID) of Hand_Vectors.Vector;
We could move cards between Dealer and Hands throughout the program, safe
in the knowledge that although a length of 52 is possible (certain, for
Dealer), that length can never be exceeded.
>>I think it would make much more sense for Generic_Sort to be declared:
>>[not classwide]
>>and the typecast to be done in the call instead. For example:
>>...
>> Sort( Float_Lists.List(L1) );
>>
>>I prefer this because it makes it explicit that the list L1
>>is being sorted /as/ a Float_Lists.List (not as a My_List, as such).
>
> But this argument applies to any operation. We shouldn't have to call sort
> differently, just because it happens to be generic.
We have to call Sort differently because it's argument isn't of type
Float_Lists.List. This fact applies to any non-inherited operation of List,
generic or not. That is my point.
> You could pass the list type as a generic actual, but that seems like
> overkill when the operation only has a single parameter.
I quite agree, and I'd argue the same for Generic_Merge (which only has two
parameters). I'm also saying that making them classwide would be overkill.
> We also need to affirm the declaration of Ordered_Sets.Generic_Keys.
What are the alternatives, please?
>>Isn't the basic idea that key values are there specifically
>>to provide fast indexed access to the elements? I don't think
>>they are not intended to be used to carry ancillary information.
>
> Well, that's the debate. We can use either model, but we have to chose one.
Okay.
> Essentially, the API as it stands now has not been designed to facilitate
> replacement of key values. The problem is that to support key modification,
> there will most likely be some kind of penality, either static (more complex
> interface) or dynamic (less efficient execution). The decision is whether
> the added generality is worth the penalty.
I don't think it is. I'm saying that I do indeed think we should stick to
the model that key values are only for the purpose of indexing.
>>> [re: Set_Capacity(M,n) where n<Length(M)]
>>I think this is a case where readability is more important
>>than conciseness of code, and I think using an explicit
>>length will usually be more readable. At least, I think its
>>meaning is a bit more intuitively obvious.
>
> The meaning of an operation is its postcondition.
I'm not arguing about the semantic meaning of the operation, Matt! How can
I put it more plainly? I'm saying that this is a matter of readability.
The semantic meanings of your suggested Set_Capacity(M,0) and of
Set_Capacity(M,Length(M)) are identical, so this is simply not an issue.
I am arguing that Set_Capacity(M,Length(M)) is more readable, and for that
reason we should disallow Set_Capacity(M,0). That is all.
In fact, I would accept the procedure being renamed (again ;-) to
Set_Minimum_Capacity with the semantics you suggest. Or possibly declare
the procedure as:
procedure Set_Capacity (Container: in out Vector|List|etc;
Minimum: in Count_Type);
so that the call can be made:
Set_Capacity (M, Minimum => 0);
>>Q5 was about using an extra parameter of an enumerated type
>>in insertion and replacement procedures to indicate what
>>should happen if the key exists
>>(insertion) or doesn't exist (replacement), with a default
>>indicating that an exception should be raised.
>>
>>I rather prefer this idea. I think it is the least of three
>>evils (no user choice, a plethora of procedures, or control-coupling).
>
> I have already stated that I am not in favor of this change. (It's at the
> wrong level of abstraction, for one thing.)
Okay, but I feel that arguments about the level of abstraction are a bit of
a nicety here. Am I wrong that we are faced with three choices, none of
which is plainly ideal?
>>Another question that arises in my mind is, for any of the
>>functions which returns a container: what is the capacity of
>>the result? In particular, should this be defined by the
>>standard, or remain implementation defined? [Call this Q7?]
>
> All this standard says is that Capacity(C) >= Length(C). If you want more
> control of the capacity, then use the procedures, not the functions.
So, are you saying that the standard /should/ leave it implementation
defined what the capacity is?
****************************************************************
From: Matthew Heaney
Sent: Tuesday, September 14, 2004 1:02 PM
>> But Find_Index returns Index_Type'Base. There's an implied
>> requirement that
>> Last_Index(V) < Index_Type'Base'Last, so it doesn't matther if
>> Last_Index(V)
>> = Index_Type'Last.
>
> If there is such a requirement, it should be explicit. But I would be
> alarmed if such a requirement really was imposed: it would be a classic
> potential source of obscure bugs.
Not at all. If Last_Index(V) = Index_Type'Base'Last, and the element
isn't in the vector, then Find will raise Constraint_Error. Nothing
obscure about that...
>> Remember the purpose of a vector is to expand. That's what influenced our
>> decision to use an integer type as the index instead of any discrete type.
>> Your argument that Last_Index(V) might equal Index_Type'Last is tantamount
>> to saying that an enumeration type should be able to be used as the index
>> type, but this matter has already been debated and settled.
>
>
> I don't think so at all (that Last_Index(V) might equal Index_Type'Last
> is tantamount to saying that an enumeration type should be able to be
> used as the index type). On an implementation that has an 16-bit
> Integer'Base type, and where V is of a package instantiated with
> Index_Type Positive, it is quite feasible that Last_Index(V) becomes 32767.
Then you used the wrong index type. Say this instead:
type Index_Type is new Count_Type range 1 .. Count_Type'Last;
Remember also that Find must perform a linear scan of the vector. In
your vector that's over 32000 elements. Hmmm... Perhaps you should
consider using a different container.
>> Allowing an Index_Type to be passed as a generic formal is really only
>> intended to allow the user to specify the starting point of the index
>> subtype range. The ending point should be large relative to the
>> number of
>> elements typically stored in the container.
>
>
> Why? Why should it (Index_Type'Last) not be equal to the maximum number
> of elements to be stored? It would be prudent to be sure that the
> maximum was never exceeded, in such a case.
But we're not arguing about Index_Type'Last, we're arguing about
Index_Type'Base'Last. If you want to define Index_Type'Last such that
it equals the maximum number of elements, then go right ahead.
> Let me give you an example. Supposing we are programming a card game,
> and we need a container that can contain a 'hand', which is a list of
> cards. The rules of this game do not limit the number of cards in a
> hand, as such, but since the cards come from a pack of 52, we can be
> absolutely certain that the limit of 52 will never be exceeded. It would
> therefore be quite reasonable (wouldn't it?) to write:
>
> type Card_Count is range 0..52;
> subtype Card_Index is Card_in_Hand_Count range 1..52;
> package Hand_Vectors is
> new Ada.Containers.Vectors (Card_ID, Card_Index);
> ...
> Dealer: Hand_Vectors.Vector := Random_52;
> Hands: array (Player_ID) of Hand_Vectors.Vector;
All you need to do to fix this is:
type Card_Count is range 0..53;
subtype Card_Index is Card_in_Hand_Count range 1..52;
package Hand_Vectors is
new Ada.Containers.Vectors (Card_ID, Card_Index);
and now all is well.
> We could move cards between Dealer and Hands throughout the program,
> safe in the knowledge that although a length of 52 is possible (certain,
> for Dealer), that length can never be exceeded.
Fine. See above.
>>> I think it would make much more sense for Generic_Sort to be declared:
>>> [not classwide]
>>> and the typecast to be done in the call instead. For example:
>>> ...
>>> Sort( Float_Lists.List(L1) );
>>>
>>> I prefer this because it makes it explicit that the list L1 is being
>>> sorted /as/ a Float_Lists.List (not as a My_List, as such).
>>
>>
>> But this argument applies to any operation. We shouldn't have to call
>> sort
>> differently, just because it happens to be generic.
>
>
> We have to call Sort differently because it's argument isn't of type
> Float_Lists.List. This fact applies to any non-inherited operation of
> List, generic or not. That is my point.
No, we don't have to call it differently. That's my point.
>> You could pass the list type as a generic actual, but that seems like
>> overkill when the operation only has a single parameter.
>
>
> I quite agree, and I'd argue the same for Generic_Merge (which only has
> two parameters). I'm also saying that making them classwide would be
> overkill.
But we want to *statically* check that both parameters have the same
specific type, something we can't do if the parameters have type List'Class.
>> We also need to affirm the declaration of Ordered_Sets.Generic_Keys.
>
>
> What are the alternatives, please?
Same as for Generic_Sort or Generic_Merge.
>>>> [re: Set_Capacity(M,n) where n<Length(M)]
>>>
>>> I think this is a case where readability is more important than
>>> conciseness of code, and I think using an explicit length will
>>> usually be more readable. At least, I think its meaning is a bit more
>>> intuitively obvious.
>>
>>
>> The meaning of an operation is its postcondition.
>
>
> I'm not arguing about the semantic meaning of the operation, Matt! How
> can I put it more plainly? I'm saying that this is a matter of readability.
The problem is that users are probably going to have to check, prior to
making the call, in order to avoid raising Constraint_Error. But this
unnecessary check is only crying wolf, since nothing bad happens when
Capacity < Length(V).
> The semantic meanings of your suggested Set_Capacity(M,0) and of
> Set_Capacity(M,Length(M)) are identical, so this is simply not an issue.
>
> I am arguing that Set_Capacity(M,Length(M)) is more readable, and for
> that reason we should disallow Set_Capacity(M,0). That is all.
That fact that you don't like this locution is not a compelling reason
to disallow it.
This API specifies a postcondition, the operation satisfies the
postcondition, and therefore there is no exception. It's very simple.
Exceptions are not a mechanism for social engineering of software.
>>> Q5 was about using an extra parameter of an enumerated type in
>>> insertion and replacement procedures to indicate what should happen
>>> if the key exists
>>> (insertion) or doesn't exist (replacement), with a default indicating
>>> that an exception should be raised.
>>>
>>> I rather prefer this idea. I think it is the least of three evils (no
>>> user choice, a plethora of procedures, or control-coupling).
>>
>>
>> I have already stated that I am not in favor of this change. (It's at the
>> wrong level of abstraction, for one thing.)
>
>
> Okay, but I feel that arguments about the level of abstraction are a bit
> of a nicety here. Am I wrong that we are faced with three choices, none
> of which is plainly ideal?
The 5-parameter insert is already ideal, since it's completely general.
Any other behavior you want can be written in terms of the canonical
insert.
>>> Another question that arises in my mind is, for any of the functions
>>> which returns a container: what is the capacity of the result? In
>>> particular, should this be defined by the standard, or remain
>>> implementation defined? [Call this Q7?]
>>
>>
>> All this standard says is that Capacity(C) >= Length(C). If you want
>> more
>> control of the capacity, then use the procedures, not the functions.
>
>
> So, are you saying that the standard /should/ leave it implementation
> defined what the capacity is?
Yes, of course. The standard needs needs to get out of the vendors'
way, too.
****************************************************************
From: Nick Roberts
Sent: Tuesday, September 14, 2004 9:21 PM
Matthew Heaney wrote:
> Not at all. If Last_Index(V) = Index_Type'Base'Last, and the element
> isn't in the vector, then Find will raise Constraint_Error. Nothing
> obscure about that...
Okay, but is that really the behaviour we want?
> Then you used the wrong index type. Say this instead:
>
> type Index_Type is new Count_Type range 1 .. Count_Type'Last;
Okay, but I think this point might not be obvious to many programmers.
> Remember also that Find must perform a linear scan of the vector. In
> your vector that's over 32000 elements. Hmmm... Perhaps you should
> consider using a different container.
I think this kind of scan will be appropriate for some applications. The
vector might contain over 32000 elements, but the scan might not start at
the beginning of the vector. Anyway, I don't think this point is very relevant.
> But we're not arguing about Index_Type'Last, we're arguing about
> Index_Type'Base'Last. If you want to define Index_Type'Last such that
> it equals the maximum number of elements, then go right ahead.
But in general, if I declare:
type T is range A..B;
I cannot know that T'B < T'Base'B.
> All you need to do to fix this is:
>
> type Card_Count is range 0..53;
> subtype Card_Index is Card_in_Hand_Count range 1..52;
> package Hand_Vectors is
> new Ada.Containers.Vectors (Card_ID, Card_Index);
>
> and now all is well.
Except that Card_Count having a rnage of 0..53 just for the convenience of
the Find function is surely poor programming?
Incidentally, I made a boob:
subtype Card_Index is Card_in_Hand_Count range 1..52;
was meant to be:
subtype Card_Index is Card_Count range 1..52;
Sorry.
>> We have to call Sort differently because it's argument isn't of type
>> Float_Lists.List. This fact applies to any non-inherited operation of
>> List, generic or not. That is my point.
>
> No, we don't have to call it differently. That's my point.
I'm pretty sure that we do. For example, if I declare:
package Thing_Vectors is Ada.Containers.Vectors(Thing,Positive);
package Sort is new Thing_Vectors.Generic_Sort; -- non-inherited op
...
procedure Foo (Them: in out Thing_Vectors.Vector); -- non-inherited op
...
type Other_Vector is new Thing_Vectors.Vector;
...
V2: Other_Vector;
...
Sort (Thing_Vectors.Vector(V2));
Foo (Thing_Vectors.Vector(V2));
we must typecast V2 for the calls to both Sort and Foo, because they are
both non-inherited operations (and not specially declared for Other_Vector).
>> I quite agree, and I'd argue the same for Generic_Merge (which only
>> has two parameters). I'm also saying that making them classwide would
>> be overkill.
>
> But we want to *statically* check that both parameters have the same
> specific type, something we can't do if the parameters have type
> List'Class.
You've got hold of the wrong end of the stick, Matt! I am saying myself
that the parameters to both Generic_Sort and Generic_Merge should /not/ be
classwide. I am agreeing with you! I /agree/ that we want to statically
check that both parameters (to any call of any instantiation of
Generic_Merge) are of the same type.
I was saying that I think Generic_Sort should not have a classwide
parameter, because it would be better style for derived types (derived from
Vector or List) to be explicitly typecast in a call to an instantiation of
Generic_Sort, than to be opaquely typecast within the implementation.
>>> We also need to affirm the declaration of Ordered_Sets.Generic_Keys.
>>
>> What are the alternatives, please?
>
> Same as for Generic_Sort or Generic_Merge.
I don't quite understand this, since Generic_Sort and Generic_Merge are
generic procedures, and Ordered_Sets.Generic_Keys is a generic package.
However, if you're suggesting that parameters of type Set in this package
be made classwide (Set'Class), I would prefer not, for similar reasons as
above.
> That fact that you don't like this locution is not a compelling reason
> to disallow it.
I wasn't trying to argue that it is a /compelling/ reason, only that it is
a reason.
Anyway, what do you think of the idea of changing the name of Set_Capacity
to Set_Minimum_Capacity, or its Capacity parameter to Minimum?
>> Okay, but I feel that arguments about the level of abstraction are a
>> bit of a nicety here. Am I wrong that we are faced with three choices,
>> none of which is plainly ideal?
>
> The 5-parameter insert is already ideal, since it's completely general.
> Any other behavior you want can be written in terms of the canonical
> insert.
Fair enough. I still think the suggested versions of Insert (with an
If_Exists parameter) and Replace (with an If_Nonexistent parameter) should
be added, on the grounds of significant convenience.
>> So, are you saying that the standard /should/ leave it implementation
>> defined what the capacity is?
>
> Yes, of course. The standard needs needs to get out of the vendors'
> way, too.
Yes, I agree with this.
****************************************************************
From: Randy Brukardt
Sent: Tuesday, September 14, 2004 9:40 PM
...
> we must typecast V2 for the calls to both Sort and Foo, because they are
> both non-inherited operations (and not specially declared for
> Other_Vector).
You never have to use a type conversion to call a routine with a class-wide
parameter of a root class, which is the case here.
Our "standard" for Claw was that a specific typed non-primitive parameter
represented a bug, as it would just restrict the uses of the library for no
benefit. We in fact wrote checking for that into our help file generator,
and later removed most of the ones found. (There were a couple of cases
where extensions would have been real problems, functions that returned
objects or for procedures with 'out' parameters. Neither applies to
Generic_Sort.)
The same appears to be true for the Containers libraries.
****************************************************************
From: Matthew Heaney
Sent: Wednesday, September 15, 2004 2:28 AM
> > But we're not arguing about Index_Type'Last, we're arguing about
> > Index_Type'Base'Last. If you want to define Index_Type'Last such that
> > it equals the maximum number of elements, then go right ahead.
>
> But in general, if I declare:
>
> type T is range A..B;
>
> I cannot know that T'B < T'Base'B.
That's because you declared T incorrectly:
type T_Base is A .. B + 1;
type T is new T_Base range A .. B;
> > All you need to do to fix this is:
> >
> > type Card_Count is range 0..53;
> > subtype Card_Index is Card_in_Hand_Count range 1..52;
> > package Hand_Vectors is
> > new Ada.Containers.Vectors (Card_ID, Card_Index);
> >
> > and now all is well.
>
> Except that Card_Count having a rnage of 0..53 just for the convenience of
> the Find function is surely poor programming?
This simply reflects Ada95 idioms for type declarations. You have to
declare a pseudo base type whose (base) range has the range required, and
then declare the real type as a subtype or derived type that restricts the
range. See the "sum of 3 numbers" thread on CLA from a few months ago.
> >>> We also need to affirm the declaration of Ordered_Sets.Generic_Keys.
> >>
> >> What are the alternatives, please?
> >
> > Same as for Generic_Sort or Generic_Merge.
>
> I don't quite understand this, since Generic_Sort and Generic_Merge are
> generic procedures, and Ordered_Sets.Generic_Keys is a generic package.
>
> However, if you're suggesting that parameters of type Set in this package
> be made classwide (Set'Class), I would prefer not, for similar reasons as
> above.
I'm leaning myself towards importing the set type as a generic formal.
We'll have to see how the ARG weighs in this weekend.
> Anyway, what do you think of the idea of changing the name of Set_Capacity
> to Set_Minimum_Capacity, or its Capacity parameter to Minimum?
Too much verbosity. (That's the same reason I'm not crazy about the name
"Insert_Or_Replace".)
****************************************************************
From: Randy Brukardt
Sent: Wednesday, September 15, 2004 6:19 PM
...
> > Our "standard" for Claw was that a specific typed non-primitive parameter
> > represented a bug, as it would just restrict the uses of the library for no
> > benefit.
>
> But it wouldn't actually /retrict/ uses of the subprogram, would it? It
> would simply mean that the user would have to write an explicit type
> conversion for an object or expression of a derived type, wouldn't it?
Writing unnecessary type conversions *is* a bug. The reason for wanting explicit
type is to indicate the possibility of a problem (for conversions that can fail or
lose precision). Neither is true here. It's the same reason that Ada 05 expands the
use of anonymous access types -- if you have conversions that can't fail, they're not
interesting conversions -- they just clutter the code.
> On the other hand, doesn't making the parameter class-wide restrict new
> overloadings of the subprogram (for a derived type)? I think that could be
> quite inconvenient occasionally.
"Occasionally", perhaps. If you use a ton of use clauses. Otherwise, it's a
non-problem, because the new routine would necessarily be in a different package.
> > We in fact wrote checking for that into our help file generator,
> > and later removed most of the ones found. (There were a couple of cases
> > where extensions would have been real problems, functions that returned
> > objects or for procedures with 'out' parameters. Neither applies to
> > Generic_Sort.)
>
> I think this is the kind of problem I'm worried about. For a container type
> T1, Generic_Sort may be instantiated to a procedure named Sort (or
> whatever) in one phase of software construction, which may then get frozen,
> and then a type T2 may be derived from T1 in a later phase. It might be an
> annoyance not to be able to declare a new overloaded Sort (or whatever) for
> T2. I suppose you'd have to name it something like Sort_T2. Not a disaster,
> but an annoyance. (Of course, you /could/ declare the overloaded Sort, but
> you couldn't call it unambiguously.)
No problem, just prefix it to call it unambiguously. Moreover, if the Sort is
class-wide, you don't even need to do this new routine; just call the original
one. (If the instantiation is in the package with the type declaration, you can
even do that with the prefix notation without any extra with or use clauses.)
> > The same appears to be true for the Containers libraries.
>
> It does seem to be a similar situation. I have often found difficulty in
> deciding on these kinds of details of the design, for packages which export
> tagged types. I generally find that it's a mistake to try to be
> too 'clever'.
I agree. My rule is that all operations are either primitive or class-wide unless
they are creating a new object of the type.
> A class-wide parameter implies a dispatching implementation, doesn't it?
No, not to me. It implies an operation that is *meaningful* to all members of a type.
How it's implemented is not relevant.
> However, the implementation of Generic_Sort would not do any dispatching;
> it would simply have to internally typecast the parameter to the root type.
That would be open to debate. It certainly would be easier to define it this way,
but it would make sense for it to dispatch to the primitive operations of the type.
(But that probably would be a mistake for performance reasons.)
> I feel that this fact indicates a bug. I think, for this reason, it would
> not be the right design to make the parameter(s) class-wide for
> Generic_Sort or for Generic_Merge.
Nope, it's irrelevant in my view. Claw has an entire package of operations on
Root_Window_Type'Class, and there isn't a single dispatching operation in the bunch.
They are all operations that make sense on any window; the implementation ought to
be irrelevant.
****************************************************************
From: Robert A. Duff
Sent: Wednesday, September 15, 2004 6:55 AM
> That's because you declared T incorrectly:
>
> type T_Base is A .. B + 1;
> type T is new T_Base range A .. B;
I think you want:
subtype T is T_Base range A .. B;
****************************************************************From: Nick Roberts
Sent: Wednesday, September 15, 2004 9:19 AM
>>But in general, if I declare:
>>
>> type T is range A..B;
>>
>>I cannot know that T'B < T'Base'B.
>
> That's because you declared T incorrectly:
>
> type T_Base is A .. B + 1;
> type T is new T_Base range A .. B;
True, but it might not be obvious to a user that this must be done for an
instantiation of Ada.Containers.Vectors.
> This simply reflects Ada95 idioms for type declarations. You have to
> declare a pseudo base type whose (base) range has the range required, and
> then declare the real type as a subtype or derived type that restricts the
> range. See the "sum of 3 numbers" thread on CLA from a few months ago.
Yes, you are quite right.
However, does this suggest that we need to add a second assertion?
pragma Assert (Index_Type'Last < Index_Type'Base'Last);
I don't really see why we don't just return Index_Type'First-1; it's what
the string packages do.
> I'm leaning myself towards importing the set type as a generic formal.
> We'll have to see how the ARG weighs in this weekend.
I think I see the logic of this idea, but I feel there is a danger that it
might be a bit confusing for some users.
>>Anyway, what do you think of the idea of changing the name of
>>Set_Capacity
>>to Set_Minimum_Capacity, or its Capacity parameter to Minimum?
>
> Too much verbosity. (That's the same reason I'm not crazy about the name
> "Insert_Or_Replace".)
Okay, but isn't verbosity a lesser sin than the potential for
misinterpretation? Up to a point, certainly. I'm not suggesting a name such
as Set_Capacity_to_This_Unless_It_Is_Less_Than_The_Length_In_Which_Case_...
Anyway, changing the parameter name from 'Capacity' to 'Minimum' isn't
increasing the verbosity (it would actually reduce it by one letter :-)
****************************************************************
From: Matthew Heaney
Sent: Wednesday, September 15, 2004 9:28 AM
Randy wrote:
>> Q1) Find_Index returns Last_Index (Container) + 1 if the element is not
>> found. This seems consistent to me (it's past the end of the container in
>> a forward search), but Matt worries that First_Index (Container) - 1
>> might be thought of as better. The trouble with First_Index (Container) -
>> 1 is that you can't put it into an object:
>> declare
>> I : Index_Type := Index_Type'First;
>> begin
>> I := Find_Index (Vect, Item, I);
>> while I <= Last_Index (Vect) loop
>> -- Do something to the element I.
>> I := Find_Index (Vect, Item, I+1);
>> end loop;
>> end;
>> If Find_Index returned Index_Type'First - 1, saving the result of
>> Find_Index would raise Constraint_Error if the item is not found. That's
>> not what we want, I think.
Actually, I think this example is wrong anyway. Object I should have
type Index_Type'Base. You could write instead (see below):
declare
I : Index_Type'Base := Find_Index (Vect, Item);
begin
while I /= No_Index loop
-- Do something to the element I.
I := Find_Index (Vect, Item, I+1);
end loop;
end;
One issue with Randy's original formulation is that there's a constraint
check every time object I is assigned the value returned by Find.
We could always liberalize what is acceptable as the value of the Index
parameter of Find. Right now, we raise CE if Index < Index_Type'First,
but it might make sense to allow No_Index (actually, any value less then
Index_Type'First) as the value, and interpret that to mean begin the
search at Index_Type'First.
Nick Roberts responded:
> The problem with Last_Index (Container) + 1 is that it may not exist,
> because Last_Index (Container) might be Index_Type'Last. On the other hand,
> we convenienty have a requirement that Index_Type'First >
> Index_Type'Base'First, which guarantees that First_Index (Container) - 1
> does always exist (as a value of Index_Type'Base).
>
> I think that swings it. I suggest Find_Index returns First_Index
> (Container) - 1 when it does not find what it is looking for.
...
This is actually closer to how std::string works. If a search fails, it
returns std::string::npos, which is defined as string::size_type(-1).
It would be similar to what you have above:
No_Index : constant Index_Type'Base :=
Index_Type'Pred (Index_Type'First);
We would then have to affirm whether To_Index should raise CE if given
No_Element as the argument, or return the value No_Index instead. (Our
original motivation for raising CE was that we didn't know what index
value to return, but if we declare No_Index as a distinguished value,
then we really do have a value to return that makes sense.)
****************************************************************
From: Matthew Heaney
Sent: Wednesday, September 15, 2004 9:31 AM
> However, does this suggest that we need to add a second assertion?
>
> pragma Assert (Index_Type'Last < Index_Type'Base'Last);
>
> I don't really see why we don't just return Index_Type'First-1; it's
> what the string packages do.
I would be in favor of declaring an object of type Index_Type'Base,
named No_Index (or whatever), whose value is Index_Type'First - 1.
Find_Index and Reverse_Find_Index would return No_Index when the search
fails.
I would also be in favor of function To_Index returning No_Index when
the parameter has the cursor value No_Element (instead of raising CE).
See my previous post for the details.
Note that you can't write that assertion, because it would fail for
generic actual types like Natural or Positive.
****************************************************************
From: Nick Roberts
Sent: Wednesday, September 15, 2004 5:25 PM
> I would be in favor of declaring an object of type Index_Type'Base,
> named No_Index (or whatever), whose value is Index_Type'First - 1.
> Find_Index and Reverse_Find_Index would return No_Index when the search
> fails.
Specifically, I suggest the following declaration is inserted into the
Ada.Containers.Vectors package specification:
No_Index: constant Index_Type'Base := Index_Type'First - 1;
> I would also be in favor of function To_Index returning No_Index when
> the parameter has the cursor value No_Element (instead of raising CE).
>
> See my previous post for the details.
I suggest the wording for To_Index in this package is changed to:
function To_Index (Position : Cursor) return Index_Type'Base;
If Position is No_Element, To_Index returns No_Index. Otherwise, it
returns the index (within its containing vector) of the element
designated by Cursor.
I suggest the wording for the Find_Index and Reverse_Find_Index functions
is changed to:
function Find_Index (Container : Vector;
Item : Element_Type;
Index : Index_Type'Base := Index_Type'First)
return Index_Type'Base;
Searches the elements of Container for an element equal to Item,
starting at position Index. If Index is less than Index_Type'First, then
the search begins at Index_Type'First. If there are no elements in the
range Index .. Last_Index (Container) equal to Item, then Find_Index
returns No_Index. Otherwise, it returns the index of the matching
element with the lowest index.
function Reverse_Find_Index (Container : Vector;
Item : Element_Type;
Index : Index_Type'Base := Index_Type'Last)
return Index_Type'Base;
Searches the elements of Container in reverse for an element equal to
Item, starting at position Index. If Index is greater than Last_Index
(Container), then the search begins at position Last_Index (Container).
If there are no elements in the range Index_Type'First .. Index equal to
Item, then Reverse_Find_Index returns No_Index. Otherwise, it returns
the index of the matching element with the highest index.
I've also added the wording "with the lowest index" and "with the highest
index" in this suggestion, in the hope it increases clarity slightly.
****************************************************************
From: Randy Brukardt
Sent: Wednesday, September 15, 2004 6:19 PM
...
> > Our "standard" for Claw was that a specific typed non-primitive parameter
> > represented a bug, as it would just restrict the uses of the library for no
> > benefit.
>
> But it wouldn't actually /retrict/ uses of the subprogram, would it? It
> would simply mean that the user would have to write an explicit type
> conversion for an object or expression of a derived type, wouldn't it?
Writing unnecessary type conversions *is* a bug. The reason for wanting
explicit type is to indicate the possibility of a problem (for conversions
that can fail or lose precision). Neither is true here. It's the same reason
that Ada 05 expands the use of anonymous access types -- if you have
conversions that can't fail, they're not interesting conversions -- they
just clutter the code.
> On the other hand, doesn't making the parameter class-wide restrict new
> overloadings of the subprogram (for a derived type)? I think that could be
> quite inconvenient occasionally.
"Occasionally", perhaps. If you use a ton of use clauses. Otherwise, it's a
non-problem, because the new routine would necessarily be in a different
package.
> > We in fact wrote checking for that into our help file generator,
> > and later removed most of the ones found. (There were a couple of cases
> > where extensions would have been real problems, functions that returned
> > objects or for procedures with 'out' parameters. Neither applies to
> > Generic_Sort.)
>
> I think this is the kind of problem I'm worried about. For a container type
> T1, Generic_Sort may be instantiated to a procedure named Sort (or
> whatever) in one phase of software construction, which may then get frozen,
> and then a type T2 may be derived from T1 in a later phase. It might be an
> annoyance not to be able to declare a new overloaded Sort (or whatever) for
> T2. I suppose you'd have to name it something like Sort_T2. Not a disaster,
> but an annoyance. (Of course, you /could/ declare the overloaded Sort, but
> you couldn't call it unambiguously.)
No problem, just prefix it to call it unambiguously. Moreover, if the Sort
is class-wide, you don't even need to do this new routine; just call the
original one. (If the instantiation is in the package with the type
declaration, you can even do that with the prefix notation without any extra
with or use clauses.)
> > The same appears to be true for the Containers libraries.
>
> It does seem to be a similar situation. I have often found difficulty in
> deciding on these kinds of details of the design, for packages which export
> tagged types. I generally find that it's a mistake to try to be
> too 'clever'.
I agree. My rule is that all operations are either primitive or class-wide
unless they are creating a new object of the type.
> A class-wide parameter implies a dispatching implementation, doesn't it?
No, not to me. It implies an operation that is *meaningful* to all members
of a type. How it's implemented is not relevant.
> However, the implementation of Generic_Sort would not do any dispatching;
> it would simply have to internally typecast the parameter to the root type.
That would be open to debate. It certainly would be easier to define it this
way, but it would make sense for it to dispatch to the primitive operations
of the type. (But that probably would be a mistake for performance reasons.)
> I feel that this fact indicates a bug. I think, for this reason, it would
> not be the right design to make the parameter(s) class-wide for
> Generic_Sort or for Generic_Merge.
Nope, it's irrelevant in my view. Claw has an entire package of operations
on Root_Window_Type'Class, and there isn't a single dispatching operation in
the bunch. They are all operations that make sense on any window; the
implementation ought to be irrelevant.
****************************************************************
From: Randy Brukardt
Sent: Wednesday, September 15, 2004 7:01 PM
> I would be in favor of declaring an object of type Index_Type'Base,
> named No_Index (or whatever), whose value is Index_Type'First - 1.
> Find_Index and Reverse_Find_Index would return No_Index when the search
> fails.
That would OK, except that you are assuming that users will somehow magically use
Index_Type'Base when using Find, but use Index_Type for everything else. That's very
impractical; I often will put the result of operations like Find directly into the
resulting data structure (sometimes in an aggregate), and it would be awful to have
to use 'Base all over. Moreover, how would anyone remember to use 'Base? I don't think
I've ever written 'Base outside of a generic unit; it certainly wouldn't be the first
thing I'd think of.
So I view this proposal as being the same as saying that Find raises Constraint_Error
whenever the object is not found. Moreover, this isn't documented, so most users will
have to find it out by trial and error. Unless they use the container very frequently,
this will be a major gotcha.
> I would also be in favor of function To_Index returning No_Index when
> the parameter has the cursor value No_Element (instead of raising CE).
That means that we have to carefully study every operation to see if the behavior for
No_Index is proper. Since most of the cursor operations are defined in terms of
To_Index, that is going to be a big job. (And most likely, some of the index
operations should *not* raise an exception if given No_Index - Find_Index for
example.)
While I'm sure we can come up with a consistent semantics, such a major rewrite will
most likely prevent the AI from being approved at this meeting. (I will be completely
opposed to approving any AIs with major changes; most of the ones approved in the past
were too full of holes to complete.)
You'd also be giving more ammunition to those who claim that this container library
isn't mature enough to standardize. Even though this is arguably a corner-case, it
gives the impression of continuing flux in the interface.
So, all in all, I'd rather leave the whole thing alone; it's insufficiently broken --
the only problem is that a failed Find would raise Constraint_Error if the array is
full. That isn't a major issue - I could argue that failed Find always should raise
an exception (that's what Claw does in most cases) - and in any case, lots of
operations are going to fail with a full array. Don't do it. :-)
****************************************************************
From: Matthew Heaney
Sent: Wednesday, September 15, 2004 11:25 PM
> > I would be in favor of declaring an object of type Index_Type'Base,
> > named No_Index (or whatever), whose value is Index_Type'First - 1.
> > Find_Index and Reverse_Find_Index would return No_Index when the
> > search fails.
>
> That would OK, except that you are assuming that users will
> somehow magically use Index_Type'Base when using Find, but
> use Index_Type for everything else. That's very impractical;
> I often will put the result of operations like Find directly
> into the resulting data structure (sometimes in an
> aggregate), and it would be awful to have to use 'Base all
> over. Moreover, how would anyone remember to use 'Base? I
> don't think I've ever written 'Base outside of a generic
> unit; it certainly wouldn't be the first thing I'd think of.
This is the most compelling argument (for retaining the current semantics).
I often assume most Ada users are like Bob Duff, and sometimes need
reminding that there are very few Bob Duffs...
> So I view this proposal as being the same as saying that Find
> raises Constraint_Error whenever the object is not found.
> Moreover, this isn't documented, so most users will have to
> find it out by trial and error. Unless they use the container
> very frequently, this will be a major gotcha.
Agreed.
...
> So, all in all, I'd rather leave the whole thing alone; it's
> insufficiently broken -- the only problem is that a failed
> Find would raise Constraint_Error if the array is full. That
> isn't a major issue - I could argue that failed Find always
> should raise an exception (that's what Claw does in most
> cases) - and in any case, lots of operations are going to
> fail with a full array. Don't do it. :-)
Agreed.
****************************************************************
From: Robert A. Duff
Sent: Thursday, September 16, 2004 10:27 AM
> This is the most compelling argument (for retaining the current semantics).
> I often assume most Ada users are like Bob Duff, and sometimes need
> reminding that there are very few Bob Duffs...
Well, Bob Duff doesn't want to use 'Base all over the place, either.
It seems to me that if there's a "special" value returned in the "not
found" case, it is entirely Good and Right to declare a constant called
Not_Found or some such. And there should be a subtype that includes
that value plus all the normal index values. Whether you're putting the
result of Find functions in data strucutures or local variables, you
should use that subtype if the result might be Not_Found -- or you can
assert that it *will* be found by using the normal index subtype.
None of this requires using 'Base "all over" -- perhaps once in the
generic, and none in client code.
I don't think this is some sort of bobduffian arcanity -- it's no
different from declaring a subtype 0..N to count the number if Things,
and another subtype 1..N to index them.
Aside: this is the same reason why Ada desperately needs a "not null"
constraint on access subtypes. Otherwise, there's no way to express the
difference between "X points at a Thing" versus "X either points at a
Thing or has a special null value". This is the source of numerous
bugs, IME.
****************************************************************
From: Matthew Heaney
Sent: Thursday, September 16, 2004 12:21 PM
The function Last_Index returns Index_Type'Base too, to handle the case
of an empty vector. Indeed, handling the result of Last_Index was the
reason we settled on an integer index type. You could argue for
declaring a No_Index value on the basis of Last_Index alone.
Note also that Get_Line returns Line'First - 1 (or is it 0?) when the
line is empty. And Nick has pointed out that the search functions in
Ada.Strings.* return 0 to indicate not found.
So there is precedent for needing to handle a "special" value for
index-based functions.
We already have an Index_Subtype in the spec, that simply renames the
generic formal type. One way to avoid having to say IT'Base is to
define that subtype as:
subtype Index_Subtype is Index_Type'Base
range Index_Type'First - 1 .. Index_Type'Last;
No_Index : constant Index_Subtype := Index_Subtype'First;
****************************************************************
From: Randy Brukardt
Sent: Thursday, September 16, 2004 1:15 PM
It would need a different name if you were to do that; "Index_Subtype"
doesn't imply the correct semantics (it implies a constraint, not the lack
of one). Moreover, as Bob points out, you really need both, and it would
seem weird to define only one or the other. We usually name these subtypes
<something>_Count and <something>_Index (or <something>_Indices); the first
because of the strong precident in Streams, Storage_Elements, and Direct_IO,
and the second to make it clear what the purpose is. (Direct_IO uses
"Positive_Count", which is more confusing than anything.)
The problem with that naming is that we already have a separate type for
Counts. So I can't think of an appropriate name.
And, in any case, my concerns about making a significant change here still
apply. This is such a minor issue that it just doesn't seem worth the effort
of checking and possibly changing the definition of every routine in the
vector package.
****************************************************************
From: Nick Roberts
Sent: Thursday, September 16, 2004 5:51 PM
> The problem with that naming is that we already have a separate type for
> Counts. So I can't think of an appropriate name.
I suggest:
subtype Index_Subtype is Index_Type;
subtype Extended_Index is
Index_Type'Base range Index_Type'First - 1 .. Index_Type'Last;
No_Index: constant Extended_Index := Index_Type'First - 1;
function To_Index (Position : Cursor) return Extended_Index;
function Last_Index (Container : Vector) return Extended_Index;
function Find_Index (Container : Vector;
Item : Element_Type;
Index : Index_Type'Base := Index_Type'First)
return Extended_Index;
function Reverse_Find_Index (Container : Vector;
Item : Element_Type;
Index : Index_Type'Base := Index_Type'Last)
return Extended_Index;
and change the wording for To_Index and Last_Index to:
function To_Index (Position : Cursor) return Extended_Index;
If Position is No_Element, To_Index returns No_Index. Otherwise, it
returns the index (within its containing vector) of the element
designated by Cursor.
function Last_Index (Container : Vector) return Extended_Index;
If Container is empty, Last_Index returns No_Index; otherwise, it
returns the position of the last element in Container.
Some other wordings may need to be changed:
Except for the wording of To_Cursor, change every occurrance of "If Index
is not in the range First_Index (Container) .. Last_Index (Container)" or
"If Index does not specify a value in the range First_Index (Container) ..
Last_Index (Container)" to "If Container is empty or Index is not in the
range First_Index (Container) .. Last_Index (Container)".
The wording for the To_Cursor function appears to work without change.
The wordings for all the Insert and Insert_Space procedures appear to work
without change (as a side-benefit of No_Index = Index_Type'First - 1).
Incidentally, the wordings for the Insert and Insert_Space procedures which
have a Position out-parameter appear to suggest that Position is not set to
anything if Length (New_Item) = 0. Is this correct?
The definitions of Append all appear to work without change. Likewise for
Delete, Delete_First, and Delete_Last. (The wording for Delete_Last does
actually work.)
As a very minor point, the wording for the Delete procedure (with Count)
has "Any exceptions raised during element assignment are propagated."
Should this be "Any exception propagated by an element assignment is
propagated by Delete."?
Also add the wording (after the package spec):
No_Index represents a position that does not correspond to any element.
The subtype Extended_Index covers the indices covered by Index_Subtype
plus the value No_Index.
I've already suggested wordings for Find_Index and Reverse_Find_Index.
I don't think anything else is affected.
It seems quite neat to me, in a way, that the declaration of
Extended_Subtype would supplant the assertion (pragma), as it would itself
cause any instantiation of Ada.Containers.Vectors with Index_Type'First =
Index_Type'Base'First to fail.
> And, in any case, my concerns about making a significant change here still
> apply. This is such a minor issue that it just doesn't seem worth the effort
> of checking and possibly changing the definition of every routine in the
> vector package.
For what it's worth, I would like to see these changes made. I don't feel
that it would be a major change, and I do feel it would be worth it. But I
guess things are getting close to the bone now.
****************************************************************
From: Jeffrey Carter
Sent: Friday, September 17, 2004 1:49 AM
Robert A Duff wrote:
>
> It seems to me that if there's a "special" value returned in the "not
> found" case, it is entirely Good and Right to declare a constant called
> Not_Found or some such. And there should be a subtype that includes
> that value plus all the normal index values. Whether you're putting the
> result of Find functions in data strucutures or local variables, you
> should use that subtype if the result might be Not_Found -- or you can
> assert that it *will* be found by using the normal index subtype.
There's a simple solution to the question of what to return if a Find
operation doesn't find the specified value. There is a software
engineering principle that a value should have one and only one
interpretation. Returning an index from a Find operation and having a
special value for the not-found case violates this principle.
The solution is to use a single value to indicate if the operation found
the value, and a 2nd value to indicate the index at which it was found
if it was found:
type Find_Result (Found : Boolean) is record
case Found is
when False =>
null;
when True =>
Index : Index_Value;
end case;
end record;
Now there is no requirement that the range of index values be smaller
than its base type (at least for the Find operation).
****************************************************************
From: Robert A. Duff
Sent: Friday, September 17, 2004 9:02 AM
I agree with the principle, but I think Ada doesn't have what it takes
to do it cleanly and efficiently. If this were, say, ML then we might
want to do something like this.
****************************************************************
From: Randy Brukardt
Sent: Friday, September 24, 2004 8:57 PM
The Update_Element routine is available in all of the containers. It has the
profile of:
procedure Update_Element
(Position : in Cursor;
Process : not null access procedure (Element : in out Element_Type));
A recent discussion on comp.lang.ada points out that this requires defining
a subprogram to make any in-place modification. That's clunky, and for
larger objects, there really isn't any option.
Matt originally had returned an access type here, but he needed a generic to
work with user defined types, which made the solution rather clunky. It also
required elements to be constrained (so that they could be aliased), which
was considered ugly for definite types.
With the recent approval of AI-363 and AI-318, both of these objections seem
to have disappeared. AI-363 repeals 3.6(11), so an aliased component can be
unconstrained. And AI-318 defines anonymous access for function returns,
which avoids the need to introduce a named access type.
So, should we reconsider a specification of:
function Update_Element (Position : in Cursor)
return not null access Element_Type;
with Query_Element changed to:
function Query_Element (Position : in Cursor)
return not null access constant Element_Type;
These would make most simple updates easy; the only lose is the "free"
binding of the element to a name. Instead of:
declare
procedure Increment (Count : in out Natural) is
begin
Count := Count + 1;
end Increment;
begin
Update_Element (A_Cursor, Increment'Access);
end;
we'd have:
declare
Item : Natural renames Update_Element(A_Cursor).all;
begin
Item := Item + 1;
end;
I'm not sure that putting access types into the specification is a good
idea, but it certainly would be easier to use. The returned access would
need rules similar to those for cursors (so that it would be erroneous to
use it after certain operations). - so it would complicate the wording of
the standard a bit.
In the absence of a lot of support for this idea, we probably should stay
with the current specification (we need to freeze this thing soon).
****************************************************************
From: Tucker Taft
Sent: Friday, September 24, 2004 9:18 PM
Randy Brukardt wrote:
> The Update_Element routine is available in all of the containers. It has the
> profile of:
>
> procedure Update_Element
> (Position : in Cursor;
> Process : not null access procedure (Element : in out Element_Type));
>
> A recent discussion on comp.lang.ada points out that this requires defining
> a subprogram to make any in-place modification. That's clunky, and for
> larger objects, there really isn't any option.
I can't parse this last sentence. What do you mean "and for
larger objects, there really isn't any option"?
> ...
> In the absence of a lot of support for this idea, we probably should stay
> with the current specification (we need to freeze this thing soon).
Leave it as is, in my view. Safety over convenience, and
defining a procedure locally isn't that inconvenient, once
you get used to the idea.
****************************************************************
From: Randy Brukardt
Sent: Friday, September 24, 2004 9:35 PM
> I can't parse this last sentence. What do you mean "and for
> larger objects, there really isn't any option"?
Yuck. Let's try again:
That's clunky, and for larger objects, there really isn't any alternative to
using in-place modification.
> Leave it as is, in my view. Safety over convenience, and
> defining a procedure locally isn't that inconvenient, once
> you get used to the idea.
Kinda my position, too, but I thought it was good to ask the wider community
before freezing this for all time.
****************************************************************
From: Matthew Heaney
Sent: Saturday, September 25, 2004 12:23 AM
Actually, having a function like this would simplify the API a little. You
could get rid of Update_Element and Query_Element, and change the Element
function like this:
function Element (C : Cursor) return access Element_Type;
This does everything Query_Element, Update_Element, and Element do. You
could then say:
E : ET renames Element (C).all;
or
Op (Element (C).all);
This is exactly analogous to the STL:
void f(vect_t::iterator i)
{
E& e = *i;
g(*i);
//...
}
For the set, you'd want to pass a constant view, so Element would be
declared this way:
function Element (C : Cursor) return access constant Element_Type;
and then a renaming of E, like this:
E : ET renames Element (C).all;
would be a constant view.
This is analogous to the C++ statement:
void f(set_t::const_iterator i)
{
const E& e = *i;
}
In the case of a map, the key selector returns a constant view, and the
element selector returns a variable view:
function Key (C : Cursor) return access constant Key_Type;
function Element (C : Cursor) return access Element_Type;
so you could say:
K : KT renames Key (C).all;
E : ET renames Element (C).all;
where K is a constant view, and E is a variable view. This is analogous to
the C++ statements:
void f{map_t::iterator i)
{
const K& k = i->first;
E& e = i->second;
}
****************************************************************
From: Nick Roberts
Sent: Saturday, September 25, 2004 9:19 AM
This seems like such a better interface design, I say it is worth making
the change.
****************************************************************
From: Robert A. Duff
Sent: Saturday, September 25, 2004 11:04 AM
> Leave it as is, in my view. Safety over convenience, and
> defining a procedure locally isn't that inconvenient, once
> you get used to the idea.
Well, I don't like dangling pointers, either, but I lean the other way
on this one: I think being forced to move a small hunk of code away
from where I want it, wrap it in several lines of syntax, and clutter
the namespace with a meaningless procedure name really IS a big pain.
(That is, forcing the programmer to make an abstraction boundary at a
place where it's inappropriate to do so.)
I view this as "safety versus readability", not "safety versus
convenience" -- and that makes the choice not so obvious.
If you give me Lisp lambdas, I wouldn't mind the procedure approach.
OTOH, if you give me C++ references, the ref approach could be safe
(I think?).
****************************************************************
From: Pascal Obry
Sent: Saturday, September 25, 2004 12:57 AM
> So, should we reconsider a specification of:
>
> function Update_Element (Position : in Cursor)
> return not null access Element_Type;
Why not keep both ?
****************************************************************
From: Matthew Heaney
Sent: Sunday, September 26, 2004 3:13 AM
You could, but the only reason we needed the procedures is because we needed
a reference to the in-place object. The function gives you that, which
obviates the need for the procedure.
For the vectors and lists, you would have this:
function Element (C : Cursor)
return not null access Element_Type;
For the sets, you would have this:
function Element (C : Cursor)
return not null access constant Element_Type;
For the maps, you would have these:
function Key (C : Cursor)
return not null access constant Key_Type;
function Element (C : Cursor)
return not null access Element_Type;
The (in)famous wordcount program would look like:
declare
C : Cursor;
B : Boolean;
begin
Insert (M, Word, 0, C, B);
declare
N : Natural renames Element (C).all;
begin
N := N + 1;
end;
end;
****************************************************************
From: Martin Dowie
Sent: Monday, September 27, 2004 3:10 AM
This looks very elegant, readable and more comprehensible to me...
****************************************************************
From: Pascal Leroy
Sent: Monday, September 27, 2004 3:51 AM
> Leave it as is, in my view. Safety over convenience, and
> defining a procedure locally isn't that inconvenient, once
> you get used to the idea.
I agree. It seems to me that returning access values is opening the door
to all sorts of dangling pointers bugs. Consider the case of a vector,
which is probably implemented using one or several arrays. Returning an
access value designating an element means that if/when an array is
reallocated, the access becomes dangling. And that can happen at the drop
of a hat. So it would be very hard indeed for the client to prevent nasty
bugs from happening. I believe that safety should be of paramount
importance when making decisions about the design of the containers: we
don't want to add cases of erroneousness unless we absolutely have to.
The alternative is to say that access values never become dangling, but
that would unnecessarily constrain the implementation. For instance, it
would not be legitimate for the implementation of vectors to reallocate an
array.
Bob wrote:
> Well, I don't like dangling pointers, either, but I lean the
> other way on this one: I think being forced to move a small
> hunk of code away from where I want it, wrap it in several
> lines of syntax, and clutter the namespace with a meaningless
> procedure name really IS a big pain.
> (That is, forcing the programmer to make an abstraction
> boundary at a place where it's inappropriate to do so.)
I suspect that readability is in the eye of the beholder, to some extent.
I'd rather see a crisp, 10-line subprogram gathering all the processing
that pertain to an element (or a key), than a 500-line procedure
squirreling away a pointer at the beginning and using it in random places
throughout the code.
Furthermore, I am not convinced that the programmer would be forced to
create "inappropriate abstractions". It seems to me that the operations
that are being performed on an element are good candidates for
encapsulation and/or reuse (remember, they don't have to be local) so most
of the time they are exactly the abstraction you want to create.
And sorry, I don't care if you have to type a few extra lines of syntax.
****************************************************************
From: Pascal Obry
Sent: Monday, September 27, 2004 3:41 AM
Ok, but this is only a workcount program ! In some cases the changes that
need to be done on the element could be quite more complex. In such cases it
would certainly better to have the procedure "callback". But well it is true
that it is always possible to pass the result of the function to a
procedure... Looks like the function is more versatile after all!
****************************************************************
From: Robert A. Duff
Sent: Monday, September 27, 2004 4:02 PM
Before I start ranting, let me say this first: I agree with whoever said
we should provide both. I don't think we need to be super-minimalist
here.
Pascal wrote:
> Tuck wrote:
>
> > Leave it as is, in my view. Safety over convenience, and
> > defining a procedure locally isn't that inconvenient, once
> > you get used to the idea.
>
> I agree. It seems to me that returning access values is opening the door
> to all sorts of dangling pointers bugs. Consider the case of a vector,
> which is probably implemented using one or several arrays. Returning an
> access value designating an element means that if/when an array is
> reallocated, the access becomes dangling. And that can happen at the drop
> of a hat.
I haven't read the entire latest version, but in the C++ STL, it doesn't
happen at the drop of a hat. It can happen at fairly well-defined
places. I hope that's still true in the Ada proposal.
>... So it would be very hard indeed for the client to prevent nasty
> bugs from happening.
Actually, it's not so hard, I think: If you call the pointer-returning
function, immediately do .all of that, and rename the result, you get
essentially what the pass-a-procedure interface gives you, with somewhat
less syntactic cruft and namespace pollution. The renaming can't
dangle, unless you modify the data structure in the scope of the
renaming. (Here, by "modify the data structure" I mean things like
adding and deleting elements -- as opposed to modifying the particular
element we've got our hands on.) But in the pass-a-procedure interface,
the same is true: if you modify the data structure within that
procedure, the parameter becomes a dangling pointer (at least, if passed
by reference, which would usually be true in the cases we're talking
about).
>... I believe that safety should be of paramount
> importance when making decisions about the design of the containers: we
> don't want to add cases of erroneousness unless we absolutely have to.
>
> The alternative is to say that access values never become dangling, but
> that would unnecessarily constrain the implementation. For instance, it
> would not be legitimate for the implementation of vectors to reallocate an
> array.
>
> Bob wrote:
>
> > Well, I don't like dangling pointers, either, but I lean the
> > other way on this one: I think being forced to move a small
> > hunk of code away from where I want it, wrap it in several
> > lines of syntax, and clutter the namespace with a meaningless
> > procedure name really IS a big pain.
> > (That is, forcing the programmer to make an abstraction
> > boundary at a place where it's inappropriate to do so.)
>
> I suspect that readability is in the eye of the beholder, to some extent.
> I'd rather see a crisp, 10-line subprogram gathering all the processing
> that pertain to an element (or a key), than a 500-line procedure
> squirreling away a pointer at the beginning and using it in random places
> throughout the code.
>
> Furthermore, I am not convinced that the programmer would be forced to
> create "inappropriate abstractions". It seems to me that the operations
> that are being performed on an element are good candidates for
> encapsulation and/or reuse (remember, they don't have to be local) so most
> of the time they are exactly the abstraction you want to create.
Well, I strongly disagree with the above paragraph.
First of all, a philisophical point: it is not our place, as language
designers, to decide that certain things are "good candidates" for
encapsulation, and then *force* programmers to encapsulate on exactly
those boundaries. Instead, we should be providing tools for
encapsulation, and let programmers choose where to use them.
I don't usually like "500-line squirreling" procedures either, but it's
not our job to tell people how many lines of code are appropriate in any
given procedure.
Second, the procedures in question *do* have to be local, in nearly all
cases, because they need more information than just the Element
parameter. That is, the useful (perhaps reusable) abstraction is
probably a procedure with *two* parameters, so we would need a local
wrapper procedure with one parameter. Consider this example:
procedure Grind_Upon_String(S: String) is
begin
for I in S'Range loop
Insert_In_Table(Key => S(I), Value => I);
end loop;
end Grind_Upon_String;
Kind of silly: we're inserting Key,Value pair into a table, consisting
of the character and its index in the table. Instead of the index in
the table, it might well be some other local variable of
Grind_Upon_String; you get the idea.
The point is, the programmer has chosen Insert_In_Table as the
appropriate abstraction. Wrapping it in another abstraction gains
nothing:
procedure Grind_Upon_String(S: String) is
begin
for I in S'Range loop
declare
procedure Insert_In_Table_With_I_As_Value
(X: Character) is
begin
Insert_In_Table(Key => X, Value => I);
end Insert_In_Table_With_I_As_Value;
begin
Insert_In_Table_With_I_As_Value(S(I));
end;
end loop;
end Grind_Upon_String;
I'd be tempted to call Insert_In_Table_With_I_As_Value
"Process_Element", which is a meaningless name -- which is appropriate,
because it's a meaningless [non]abstraction.
The fact that the above has a loop is not relevant to my point -- we've
found an element by some means, and we want to do something with it (or
perhaps modify it).
I think if you inspect your own code, and look for cases where you're
processing one element of some data structure (either in a loop, or
based on a lookup, or based on some other info), you will find few cases
where the code to process one element is exactly one call to a procedure
with exactly one parameter (the element).
Consider a simple algorithm for reversing a sequence, by moving two
indices (or cursors!) inward from both ends. Surely the "swap the two
current items" code doesn't deserve its own procedure (although "swap
two items" with two parameters probably does).
> And sorry, I don't care if you have to type a few extra lines of syntax.
Come on, Pascal! Surely you know me better than that! When I complain
about verbosity, I'm complaining about having to read useless junk --
not about having to type it in. See the second Grind_Upon_String above,
which has a lot of "noise" compared to the amount of code conveying
useful information to the reader.
By the way, whether we use an accessor-returning-pointer or
pass-a-procedure, it seems like we need two versions: one for read-only
access, and one for read/write access.
****************************************************************
From: Randy Brukardt
Sent: Monday, September 27, 2004 5:07 PM
Bob Duff wrote:
> Before I start ranting, let me say this first: I agree with whoever said
> we should provide both. I don't think we need to be super-minimalist
> here.
I don't see the point. No one is going to write "Process" subprograms if
they don't have to. Once we've defined access versions, we're done there.
(It would be especially good if we could write the language rules to avoid
dangling in most cases.)
I do agree in one sense though; I see no reason to drop the convenient
value-returning function and value-replacing procedure. Doing so would just
clutter up the code with .alls; and you really need the procedure for
indefinite types (because you can't change constraints via .all there - the
objects can be constrained and would need to be reallocated).
> Actually, it's not so hard, I think: If you call the pointer-returning
> function, immediately do .all of that, and rename the result, you get
> essentially what the pass-a-procedure interface gives you, with somewhat
> less syntactic cruft and namespace pollution. The renaming can't
> dangle, unless you modify the data structure in the scope of the renaming.
But there isn't a way to enforce this usage (unless the infinite
accessibility idea flies). There would be a lot more cases of erroneous
usage, which I know will make some reviewers nervous.
> But in the pass-a-procedure interface,
> the same is true: if you modify the data structure within that
> procedure, the parameter becomes a dangling pointer (at least, if passed
> by reference, which would usually be true in the cases we're talking
about).
Humm, sounds like a case that needs to be enumerated in the "Erroneous
Execution" part of the standard. "If Element_Type is not a by-copy type,
...."
> By the way, whether we use an accessor-returning-pointer or
> pass-a-procedure, it seems like we need two versions: one for read-only
> access, and one for read/write access.
We have Query_Element (read-only) and Update_Element (read-write) currently.
It would expect that we'd only change to anon access returns for those, with
no other changes to the spec. (as I noted above).
****************************************************************
From: Robert A. Duff
Sent: Monday, September 27, 2004 5:18 PM
I wrote:
> > Before I start ranting, let me say this first: I agree with whoever said
> > we should provide both. I don't think we need to be super-minimalist
> > here.
Randy replied:
> I don't see the point. No one is going to write "Process" subprograms if
> they don't have to. ...
That may well be true. If so, it's good evidence that the Process
subprogram is a bitter pill (for curing an admittedly nasty disease).
> > Actually, it's not so hard, I think: If you call the pointer-returning
> > function, immediately do .all of that, and rename the result, you get
> > essentially what the pass-a-procedure interface gives you, with somewhat
> > less syntactic cruft and namespace pollution. The renaming can't
> > dangle, unless you modify the data structure in the scope of the renaming.
>
> But there isn't a way to enforce this usage (unless the infinite
> accessibility idea flies). There would be a lot more cases of erroneous
> usage, which I know will make some reviewers nervous.
Correct. We could put in a NOTE recommending renaming. Whenever I
write these sorts of return-pointer things, I put in a comment saying
"Beware dangling pointers", and recommending renames.
****************************************************************
From: Ehud Lamm
Sent: Tuesday, September 28, 2004 2:27 AM
For what it's worth I am with Bob as regards this issue. The style of
programming implied by the collection interface (as inspired by the STL and
other collection interfaces etc.) encourges the creation of small and rather
meaningless element processing procedures, ones that are often quite
thightly depedent on local scope.
Without "anonymous functions" this style of programming can have a bad
impact on readability, an reliability. Notice that C#, for example, added
"anonymous delegates" to better support this style of programming. I guess
that's out of the question for us at this point...
We should keep in mind that the renaming "trick" reuiqres deep understanding
of the language, and is quite subtle for beginners to understand. A NOTE is
a good idea, as well as a style guide (is there going to be a new AQ&S
guide?)
****************************************************************
From: Matthew Heaney
Sent: Tuesday, September 28, 2004 8:59 AM
I had this problem today (see ai302/examples/shapes). I needed to sort array, which looks like this:
Rect : aliased Rectangle_Type;
Line : aliased Line_Type;
Face : aliased Face_Type;
type Shape_Array is
array (Positive range <>) of access Shape_Type'Class;
V : Shape_Array := (Rect'Access,
Line'Access,
Face'Access);
I decided to try out the fancy new array declaration syntax, that allows an anonymous access type as the array element subtype.
procedure Sort is
new Ada.Containers.Generic_Array_Sort
(Positive,
???,
Shape_Array);
I have no actual type match the Generic_Array_Sort.Element_Type formal. To solve this problem, I came up with another kind of sorting procedure:
generic
type Index_Type is (<>);
with function Less (Left, Right : Index_Type)
return Boolean is <>;
with procedure Swap (Left, Right : Index_Type) is <>;
procedure Generic_Sort (First, Last : in Index_Type'Base);
That allows me to say:
Sort_V:
declare
function Less (I, J : Positive) return Boolean is
IW : constant Point_Type := West (V (I).all);
JW : constant Point_Type := West (V (J).all);
begin
return IW.X < JW.X;
end;
procedure Swap (I, J : Positive) is
E : Shape_Type'Class renames V (I).all;
begin
V (I) := V (J);
V (J) := E'Access;
end;
procedure Sort is
new Generic_Sort (Positive);
begin
Sort (V'First, V'Last);
end Sort_V;
I don't know if this will be a problem or not, but I thought I'd bring it up...
****************************************************************
From: Matthew Heaney
Sent: Tuesday, September 28, 2004 12:11 PM
Actually, I just realized that this is an issue even in Ada today, since
you can declare an array object whose type is anonymous:
V : array (1 .. 3) of Shape_Class_Access;
We don't have a generic actual array type to match the Array_Type
formal. However, the Generic_Sort declared below works for this array
object declaration, too.
****************************************************************
From: Tucker Taft
Sent: Tuesday, September 28, 2004 12:35 PM
I would like to see a generic sort like that as well.
It is probably too late to standardize it, but getting
it into your "reference implementation" would certainly
be a start. I have never been completely happy with
the array sorts we provide, since they seem unnecessarily
"concrete." So long as the user provides the compare
and the swap, we really don't care what is the type of
the "array" or array-like thing.
****************************************************************
From: Randy Brukardt
Sent: Tuesday, September 28, 2004 2:34 PM
That's true as long as you don't care about the usability and performance of
the result. But Sort is an expensive and common operation, and specializing
it enough to make it perform reasonably is valuable. I'd be very opposed to
a compare and swap sort as the only one provided.
To expand on that a bit, the usability issue should be obvious: you usually
would have to write a swap routine. While the compare often already exists
for other reasons, there almost never is a reason to have a swap.
The performance issue is simply that a swap routine injects an extra
subprogram call into the mix. Moreover, it prevents any optimization of
element movement - you have to use a straight swap even if something better
is available. (You can't take advantage of relinking elements or the fact
that list sorts are stable for free this way.) For small elements, that
overhead is substantial. (And on a generic sharing implementation, it is
even worse. Ours has to save/restore displays on formal subprogram calls.)
****************************************************************
From: Tucker Taft
Sent: Friday, September 2, 2004 4:23 PM
I was definitely not suggesting we drop the others.
I was saying that for my personal use, I have found
the existing ones overly "concrete." Your mileage
obviously varies, and I accept that. I think Matt's
very-generic sort would be nice to have, but I don't
think it is worth standardizing at this point...
****************************************************************
From: Matthew Heaney
Sent: Tuesday, September 28, 2004 4:16 PM
Has the language been modified to allow an anonymous access type as the
generic formal array element subtype?
generic
type IT is (<>);
type ET (<>) is limited private;
type Array_T is array (IT) of access ET; --legal?
procedure GP (A : Array_Type);
Can the type (with access constant element subtype):
type Array_T is array (1 .. 3) of access constant T;
be passed as the actual type for generic formal array type GP.Array_T?
Are there any other combinations?
****************************************************************
From: Randy Brukardt
Sent: Tuesday, September 28, 2004 4:28 PM
> Has the language been modified to allow an anonymous access type as the
> generic formal array element subtype?
>
> generic
> type IT is (<>);
> type ET (<>) is limited private;
> type Array_T is array (IT) of access ET; --legal?
> procedure GP (A : Array_Type);
No.
> Can the type (with access constant element subtype):
>
> type Array_T is array (1 .. 3) of access constant T;
>
> be passed as the actual type for generic formal array type GP.Array_T?
No.
> Are there any other combinations?
Who cares? There are a lot of cases in Ada where you can't use an anonymous
type to do something. (You can't write a type conversion or qualified
expression, for instance.) If it hurts, don't do that. :-)
Anonymous types are supposed to be a convinience feature, not a cornerstone
of design. Use them sparingly.
****************************************************************
From: Matthew Heaney
Sent: Friday, September 28, 2004 4:40 PM
OK, but I think I gave a reasonable example. If I declare my array this
way:
declare
type Array_T is array (Positive range <>) of T_Access;
E1 : aliased T;
E2 : aliased T;
E3 : aliased T:
A : Array_T := (E1'Unchecked_Access,
E2'Unchecked_Access,
E3'Unchecked_Access);
begin
...
end;
The issue is that type T_Access is declared in an outer scope, and so
the language requires the use of 'Unchecked_Access. However, here
that's simply crying wolf.
I'd rather say:
A : Array_T := (E1'Access,
E2'Access,
E3'Access);
but to do that I need to either declare a local access type, or declare
the array element subtype as an anonymous access type. I was trying to
be sparing, so I chose the latter, but then that created the problem
instantiating the generic...
****************************************************************
From: Tucker Taft
Sent: Tuesday, September 28, 2004 4:56 PM
I will admit I never noticed that these anonymous
access types didn't make it into generic formals.
It seems they should, presuming we have a good
definition for "statically matching subtypes."
That is the requirement, in general, for component
subtypes. I believe we allow them in the discriminant
part of formal discriminated types, so I don't see
why we shouldn't allow them in the component-subtype
definition for a formal array type. I think this
was an oversight rather than intentional.
****************************************************************
From: Tucker Taft
Sent: Tuesday, September 28, 2004 5:17 PM
Actually, the syntax for formal_array_type_definition
simply says array_type_definition, so anonymous
access types are permitted as the component type
in a generic formal. So I think Randy was wrong
in saying they weren't permitted.
****************************************************************
From: Randy Brukardt
Sent: Tuesday, September 28, 2004 5:24 PM
OK, but then we haven't defined a matching rule. Or does that somehow fall
out?
****************************************************************
From: Tucker Taft
Sent: Tuesday, September 28, 2004 7:35 PM
It requires that the component subtypes match statically.
That is defined for anonymous access types in 4.9.1(2) to
require that the designated subtypes match statically.
This was updated in AI-231 to also require that
null-exclusiveness and access-to-constantness match.
****************************************************************
From: Pascal Leroy
Sent: Wednesday, September 29, 2004 4:20 AM
> I think Randy and I were worried about two different kinds of
> dangling references. I was worried about the one that would
> occur if you left the scope where a vector object was
> declared. Randy was worrying about dangling references that
> would occur if you altered the vector object, with the side
> effect of some part of it being deallocated and then
> reallocated elsewhere.
Thanks for the example and the clarification. I have been watching this
thread with total bewilderment because I had no idea what problem you were
trying to solve.
My feeling is that the restrictions that you have to impose on the
function, in particular in the case where it returns a call to another
function, are so drastic as to seriously cripple
functions-returning-anonymous-accesses.
> This latter problem could happen with the access-procedure
> approach as well, and I believe it is not possible to create
> a reference that is so short-lived that you can completely
> eliminate that problem.
Well, to be honest, in the access-to-procedure approach, you could at
least "lock" the container while you are calling the access-to-procedure,
thereby detecting the situation where deallocation/reallocation would
happen. I am not saying that we should require that, but it would be a
viable implementation option for situations where safety is a prime
concern. On the other hand, if you return an access to a part of the
container, there is no way that you can prevent erroneousness.
****************************************************************
From: Randy Brukardt
Sent: Wednesday, September 29, 2004 6:03 PM
> Well, to be honest, in the access-to-procedure approach, you could at
> least "lock" the container while you are calling the access-to-procedure,
> thereby detecting the situation where deallocation/reallocation would
> happen.
That's an excellent idea, and one that proves a decisive advantage for the
current approach.
> I am not saying that we should require that, but it would be a
> viable implementation option for situations where safety is a prime
> concern.
I'm not sure why we *shouldn't* require that. The check appears to be
cheaper than the check for cursor abuse, which we've mandated. And we have
to write the text about abuse in any case - we might as well use that to
make it safe.
In particular, while in Update_Element's Process routine, calling an
operation on the same container for the same element that could modify the
element would raise Program_Error. For vectors, we'd also want that to
happen for any operation that could expand the vector or make the cursor
ambiguous (see the Bounded Error for the definition of ambiguous).
There's actually not a problem if the element is by-copy for most of those
cases, but I'd rather that we didn't define the semantics based on
privacy-breaking properties of the element type. And this routine will
usually be used only on large, pass-by-reference objects.
The only case where the check could get complex is if a Process routine
called Update_Element on the same container, but a different element. We
could outlaw that to make the check easy, or we could allow it and pay a
price in a slightly more complex check.
We need also need rules for Iterate. Checking for Iterate is more expensive
or overly broad.
Alternatively to mandating checks, we can make these cases bounded errors;
either Program_Error is raised, or (some) element is modified. That would
reduce the need to make checks to just deletions of nodes (and reallocations
of vectors). I don't see any need for erroneousness. (Which should help
selling this to the safety-first folks.)
> On the other hand, if you return an access to a part of the
> container, there is no way that you can prevent erroneousness.
Right. I think that clearly states it should be left as is.
****************************************************************
From: Pascal Leroy
Sent: Thursday, September 29, 2004 2:06 AM
> I'm not sure why we *shouldn't* require that. The check
> appears to be cheaper than the check for cursor abuse, which
> we've mandated. And we have to write the text about abuse in
> any case - we might as well use that to make it safe.
Fine with me. I see safety as of critical importance for these packages
anyway.
> There's actually not a problem if the element is by-copy for
> most of those cases, but I'd rather that we didn't define the
> semantics based on privacy-breaking properties of the element
> type.
Agreed.
> The only case where the check could get complex is if a
> Process routine called Update_Element on the same container,
> but a different element. We could outlaw that to make the
> check easy, or we could allow it and pay a price in a
> slightly more complex check.
I could go either way. Element-level locking is going to require an extra
integer for each element. No big deal for big elements, but it might be a
significant overhead for small elements. (You need an integer, not a
boolean, to do the locking because of recursive calls.)
> Alternatively to mandating checks, we can make these cases
> bounded errors;
Right, but I would be slightly in favor of making the behavior
deterministic here. After all, the whole point of this library is that
you can port your code more easily. Bounded errors can cause nasty
porting problems.
****************************************************************
From: Randy Brukardt
Sent: Thursday, September 30, 2004 12:25 PM
> I could go either way. Element-level locking is going to require an extra
> integer for each element. No big deal for big elements, but it might be a
> significant overhead for small elements. (You need an integer, not a
> boolean, to do the locking because of recursive calls.)
I was thinking of a list of locked elements in the container; much less
space overhead. Calling Update_Element a second time on the same element
surely should be detected, so I wouldn't try to allow recursive calls. (But
even if that is allowed, a list of locked elements still would work. The
list would usually be small - that is 0 or 1 elements.)
Note that this is another case where an "unchecked" set of containers could
be defined in a secondary standard if the overhead really matters.
> > Alternatively to mandating checks, we can make these cases
> > bounded errors;
>
> Right, but I would be slightly in favor of making the behavior
> deterministic here. After all, the whole point of this library is that
> you can port your code more easily. Bounded errors can cause nasty
> porting problems.
I agree.
****************************************************************
From: Tucker Taft
Sent: Thursday, September 30, 2004 12:37 PM
I think mandating this is a mistake at this point.
I think it requires too much careful analysis of
the implementation implications.
I think we should clearly define what is "evil" but
forcing all implementations to catch all evil behavior
is overkill, I believe. We want the user to know
what behavior is portable, and we can rely on implementors
to try to catch most evil behaviors, but give up when
it gets too hard. By saying that the behavior is
"unspecified" in the evil cases, we are hopefully
making a clear indication to the user that it is
non portable.
****************************************************************
From: Randy Brukardt
Sent: Thursday, September 30, 2004 1:04 PM
> I think mandating this is a mistake at this point.
> I think it requires too much careful analysis of
> the implementation implications.
I've already done that, and it is a fairly simple check. That, of course,
depends on exactly what is prohibited.
> I think we should clearly define what is "evil" but
> forcing all implementations to catch all evil behavior
> is overkill, I believe.
I agree, but this particular case (at least the vast majority of it) is easy
to check, without much space overhead.
We can't reliably detect dangling cursors (I've tried, and have concluded it
has to be erroneous); but this can be detected.
My intent is for our implementation to detect all of the bounded error cases
and most dangling cursors. Which demonstrates Pascal's point: moving from
some other implementation to ours could very well cause problems, because
we're detecting problems that the other implementation ignores.
> We want the user to know
> what behavior is portable, and we can rely on implementors
> to try to catch most evil behaviors, but give up when
> it gets too hard. By saying that the behavior is
> "unspecified" in the evil cases, we are hopefully
> making a clear indication to the user that it is
> non portable.
Someone (perhaps it was you) told me that "unspecified" was worse than
erroneous.
When I've said "unspecified" in the containers text, I really mean that some
result is returned, or some exception is raised. But not that unrelated
memory is overwritten, or that the command to launch the missles is sent.
That implies that the containers are not compiled with checks suppressed,
for instance.
I wonder if we need to be a bit tighter than a blanket "unspecified". One
way to do it would be to define a "corrupted container", and then to say
that any operation on a corrupted container either raises some exception,
never returns, or returns with any function result or "out" parameters
having unspecified values.
Is this worth doing? (It would improve the safety a bit.)
****************************************************************
From: Matthew Heaney
Sent: Thursday, September 30, 2004 1:21 PM
> Is this worth doing? (It would improve the safety a bit.)
Functions that return anonymous access types = yes.
Adding extra safety checks = no.
****************************************************************
From: Matthew Heaney
Sent: Thursday, September 30, 2004 1:22 PM
Obviously, I'm with Tucker.
Manipulating the container (specifically, changing its cardinality)
during (passive) iteration is a Bad Thing to do, but a container should
not be required to detect this.
I am in favor of functions returning anonymous access types that
designate container elements. This is what the STL does, and this is
what Charles (sort of) does. I'm not worried about dangling references,
since it's no different from:
declare
X : Integer_Array_Access := new Integer_Array (1 .. 1);
I : Integer renames X (X'First);
begin
Free (X);
I := 42; -- dangling reference
end;
The guideline for programmers is to always rename the result of the
function, and to declare the object in the most inner scope possible:
procedure Op (L : in out List) is -- for example
C : Cursor := First (L);
begin
if Has_Element (C) then
declare
I : Integer renames Query_Element (C).all;
begin
if I = 42 then
Delete (L, C);
--reference to I would be dangling here, but...
end if;
end;
--here there is no I to reference
end if;
end Op;
I do this all the time. For example, I have an app that uses a list as
a queue. Each list element has a reference count that indicates how
many objects are referring to that queue element (each object has its
own list cursor). When I append a new item to the queue, or when a
client decrements its own contribution to the count, I inspect the
front-most item and delete it if the reference count is 0. Something like:
procedure Unjoin (My_Cursor : in out Cursor) is
E : Entry_Type renames To_Access (My_Cursor).all;
begin
E.Ref_Count := E.Ref_Count - 1;
if E.Ref_Count = 0 and then My_Cursor = First (Q) then
Delete (Q, My_Cursor);
--any reference here to E would be dangling, but...
else
My_Cursor := No_Element;
end if;
end Unjoin;
--here there is no E to reference
This behavior is simply a consequence of the nature of containers, which
are merely a mechanism for storing and accessing elements. It's the job
of the container to stay out of the element's way.
Worrying about a single element is the least of your problems, since the
entire container can go away:
declare
C : Cursor;
begin
declare
L : List;
begin
-- ... populate L
C := First (L);
end;
Replace_Element (C, By => 42); -- oops!
end;
As you can see, it's quite easy to have a dangling reference, even
without functions that return an anonymous access type.
There are debug versions of the STL. Something like that could easily
be done for the AI-302 containers. A vendor could provide a specialized
version of the library (hey, I'd even write it) that detects errors such
as danging cursor references, etc, but without regard for performance.
When the application developer is satisfied, he can simply adjust his
include path to get the performance-optimized version.
****************************************************************
From: Tucker Taft
Sent: Thursday, September 30, 2004 1:17 PM
I agree if the implementation is manageable, and
the definition is relatively short, safety is
worth the effort. However, Ada doesn't always
detect dangling references, though it makes an
effort to minimize them. I think we need to
put this AI to bed very soon, so I am reluctant
to keep fiddling with it. I'm sure Randy feels
the same way, so I'll trust Randy to make only
"appropriate" changes at this point.
I still see the decision about Update_Element vs.
some kind of Element_Ptr function as up in the air.
Do you feel a decision has been made one way or
the other?
Independent of that decision, should I put in some energy
to define the accessibility level for anon access
function results to at least enable safe definition
of functions like Element_Ptr, if not for Containers,
perhaps for other similar interfaces? As defined now,
the anon-access function returns don't really provide
much of any added power or safety to the language.
If we can come up with a definition that allows them
to be used for things *like* Element_Ptr, that would
seem to give them some real added value.
Guidance welcomed!
****************************************************************
From: Randy Brukardt
Sent: Thursday, September 30, 2004 1:42 PM
> I agree if the implementation is manageable, and
> the definition is relatively short, safety is
> worth the effort. However, Ada doesn't always
> detect dangling references, though it makes an
> effort to minimize them. I think we need to
> put this AI to bed very soon, so I am reluctant
> to keep fiddling with it. I'm sure Randy feels
> the same way, so I'll trust Randy to make only
> "appropriate" changes at this point.
I agree, although I'm trying to get a feeling for what appropriate is. I'd
appreciate some comments on whether an unqualified "unspecified" is too
broad.
> I still see the decision about Update_Element vs.
> some kind of Element_Ptr function as up in the air.
> Do you feel a decision has been made one way or
> the other?
Personally, I agree with Pascal. The fact that the *possibility* exists to
avoid problems with the callback versions is a significant advantage that
does not exist for the version that returns an access.
Moreover, at this late date, we need a strong consensus to make a change.
Given that Pascal and I are against a change in this area, I don't think we
have that.
> Independent of that decision, should I put in some energy
> to define the accessibility level for anon access
> function results to at least enable safe definition
> of functions like Element_Ptr, if not for Containers,
> perhaps for other similar interfaces? As defined now,
> the anon-access function returns don't really provide
> much of any added power or safety to the language.
> If we can come up with a definition that allows them
> to be used for things *like* Element_Ptr, that would
> seem to give them some real added value.
I personally don't think it is worth it. I've been convinced that you can't
completely eliminate dangling pointers, and there is no such thing as a
little bit of erroneousness. :-)
****************************************************************
From: Randy Brukardt
Sent: Thursday, September 30, 2004 1:51 PM
> I am in favor of functions returning anonymous access types that
> designate container elements. This is what the STL does, and this is
> what Charles (sort of) does. I'm not worried about dangling references,
> since it's no different from:
>
> declare
> X : Integer_Array_Access := new Integer_Array (1 .. 1);
> I : Integer renames X (X'First);
> begin
> Free (X);
> I := 42; -- dangling reference
> end;
That's the problem. This is too unsafe for many of us;
Unchecked_Deallocation has that name for a reason!
...
> As you can see, it's quite easy to have a dangling reference, even
> without functions that return an anonymous access type.
Sure, but these are also much easier to detect than those on an access type.
> There are debug versions of the STL. Something like that could easily
> be done for the AI-302 containers. A vendor could provide a specialized
> version of the library (hey, I'd even write it) that detects errors such
> as danging cursor references, etc, but without regard for performance.
> When the application developer is satisfied, he can simply adjust his
> include path to get the performance-optimized version.
We agreed that the Madison meeting that the default for the containers would
be safe, and that implementers could provide "unchecked" versions for
greater performance. You have it somewhat backwards.
In any case, I don't think that these checks will have much impact on
performance (the main cost is a bit of additional memory per element). If
you are willing to have a 99.5% detection (which I think is the best you can
do anyway), just comparing a pair of integer serial numbers will detect
virtually all dangling cursors. It doesn't quite catch all problems (if the
memory has been turned back to the OS, you might get a fault that you can't
handle; and it's possible that some other use of the memory might happen to
"fake" the serial number).
I intend that to be our primary implementation. If it turns out that the hit
matters for some application (and that will be proven by profiling, not
speculation!), it would easy enough to provide an "unchecked" version.
Because we can't detect *all* such accesses means that we can't require
detection in general (thus the erroneous cases for dangling cursors). But it
seems silly to use that to say that we shouldn't detect the easy cases (like
deleting an element that we're actively modifying).
****************************************************************
From: Tucker Taft
Sent: Thursday, September 30, 2004 2:22 PM
I think it is fine to say "unspecified" for clearly
"evil" situations. Trying to specify exactly what
happens will just allow users to try to depend
on the specified behavior.
Randy Brukardt wrote:
>>I agree if the implementation is manageable, and
>>the definition is relatively short, safety is
>>worth the effort. However, Ada doesn't always
>>detect dangling references, though it makes an
>>effort to minimize them. I think we need to
>>put this AI to bed very soon, so I am reluctant
>>to keep fiddling with it. I'm sure Randy feels
>>the same way, so I'll trust Randy to make only
>>"appropriate" changes at this point.
>
>
> I agree, although I'm trying to get a feeling for what appropriate is. I'd
> appreciate some comments on whether an unqualified "unspecified" is too
> broad.
****************************************************************
From: Randy Brukardt
Sent: Thursday, September 30, 2004 2:44 PM
I think you missed the point. The wording currently uses says that the
"behavior is unspecified". Someone privately made the claim that that allows
*anything*, including overwritting unrelated objects or launching the
missile. There is no need to allow *that*.
So my question was whether we needed to tighten up the wording so that
exactly what is unspecified is more clear:
The operation raises some exception; or
Never returns; or
Returns with unspecified values for any function results and
in out and out parameters.
Note that I don't want to specify the results, only that any corruption be
limited to the container and parameters to operations.
****************************************************************
From: Tucker Taft
Sent: Thursday, September 30, 2004 3:24 PM
I think any time we say "unspecified" it is possible
for implementors to do something truly stupid.
I don't see why we need to go out of our way
to prevent that here.
****************************************************************
From: Pascal Leroy
Sent: Friday, October 1, 2004 7:56 AM
> I think we need to
> put this AI to bed very soon, so I am reluctant
> to keep fiddling with it. I'm sure Randy feels
> the same way, so I'll trust Randy to make only
> "appropriate" changes at this point.
Exactly. Based on all the traffic that I have read lately, and given
the tight schedule constraints that we have to obey, here is my
preference:
1 - Keep Update_Element the way it is (with access-to-subprogram
Process), and don't provide a version exposing pointers. This has been
discussed extensively in Phoenix, and the group disliked the pointer
version (one Tucker Taft in particular was quite vocal). Granted, some
language limitations have been lifted by other AIs, but I don't see that
it significantly affects the Phoenix decision. Plus, the
access-to-subprogram version can be used to build a "safe" container,
the other cannot.
2 - Don't require the container to be safe in the face of updates
occurring during a call to Update_Element. It's OK to let
implementations compete on the level of checking they do.
3 - Don't try to specify what is a "corrupt" container or what happens
when you operate on such a container. Just use "unspecified" as in the
current write-up.
4 - Give up on the idea of infinite accessibility depth for function
results. It's just too late for such a change. We don't have the time
necessary to work out the implications of that change. In particular,
the assume-the-worst rules for access parameters could significantly
reduce the usefulness of functions returning an anonymous access type.
AI 318 has been quite contentious in the past; don't rock the boat.
I am going to ask Randy to update AI 302 according to 1, 2, and 3 above.
Also, AI 318 was approved with changes at the last meeting, so I am
going to send it to WG9 in November (after editorial review), unless
someone asks for a letter ballot.
Sorry, folks, but we have to draw the line at some point.
****************************************************************
From: Robert A. Duff
Sent: Friday, October 1, 2004 1:01 PM
Pascal said:
> 1 - Keep Update_Element the way it is (with access-to-subprogram
> Process), and don't provide a version exposing pointers. This has been
> discussed extensively in Phoenix, and the group disliked the pointer
> version (one Tucker Taft in particular was quite vocal). Granted, some
> language limitations have been lifted by other AIs, but I don't see that
> it significantly affects the Phoenix decision. Plus, the
> access-to-subprogram version can be used to build a "safe" container,
> the other cannot.
If we use the pass-a-procedure approach (which I still don't like, for
reasons already stated), then we need to decide whether it's OK to
modify the container during that procedure. Pascal is saying here, "No,
but implementations need not check." But note that gave an example that
would suggest otherwise. He was using the return-a-pointer method, but
the issue is the same. Basically, his example was to do a lookup,
returning a pointer, and then delete that element (under some
circumstances). The deletion happens (just) before the pointer goes out
of scope, but the deletion is the last reference to that pointer.
We need to decide whether that's a reasonable thing to do (I think so).
If so, we shouldn't say (in the pass-a-proc method) it's an error to
modify the container during the passed procedure. Or (in the
return-a-ref method) that it's an error to modify the container while
that pointer still exists.
> Sorry, folks, but we have to draw the line at some point.
True.
****************************************************************
From: Randy Brukardt
Sent: Friday, October 1, 2004 2:55 PM
...
> If we use the pass-a-procedure approach (which I still don't like, for
> reasons already stated), then we need to decide whether it's OK to
> modify the container during that procedure. Pascal is saying here, "No,
> but implementations need not check." But note that gave an example that
> would suggest otherwise. He was using the return-a-pointer method, but
> the issue is the same. Basically, his example was to do a lookup,
> returning a pointer, and then delete that element (under some
> circumstances). The deletion happens (just) before the pointer goes out
> of scope, but the deletion is the last reference to that pointer.
>
> We need to decide whether that's a reasonable thing to do (I think so).
> If so, we shouldn't say (in the pass-a-proc method) it's an error to
> modify the container during the passed procedure. Or (in the
> return-a-ref method) that it's an error to modify the container while
> that pointer still exists.
It's absolutely unreasonable in the pass-a-proc situation, because you have
the object, not a pointer to it. You can't even get access to the cursor in
order to do a delete without standing on your head (you'd have to an uplevel
access to it). That's very different than the return-a-pointer method, where
you're in the same scope.
Update_Element is intended for updates to the element. Period. Doing
anything else in the Process procedure means that you are on very thin ice.
If you want to delete the element, do that after you leave Update_Element
(that's what the return-a-pointer version is doing after all). That's why
the cases are very different, and why the pass-a-proc is preferred.
Moreover, we do not want to add any erroneous cases here, and there is no
need to do so. (On this point, I disagree with Pascal's resolution; it will
take *more* text to make these cases erroneous, and it will save very little
in terms of implementation. It also takes the rules out of line of the
Update_Element definition, which isn't good either.)
Note that none of this applies to Iterate, which passes a cursor. We have
extensive rules about cursors (dangling and otherwise), and moreover it
makes perfect sense to delete some records while iterating over them. But we
do have to say the order of iteration is unspecified if the container is
modified by the Process routine.
****************************************************************
From: Pascal Leroy
Sent: Saturday, October 2, 2004 3:58 AM
> Moreover, we do not want to add any erroneous cases here, and
> there is no need to do so. (On this point, I disagree with
> Pascal's resolution; it will take *more* text to make these
> cases erroneous, and it will save very little in terms of
> implementation. It also takes the rules out of line of the
> Update_Element definition, which isn't good either.)
Technically I agree with you. I was under the impression that there was
no consensus on this topic, however, and surely we cannot afford to make
sizeable semantic changes to this AI in Atlanta. It would be good to hear
what other people feel. I know that some members have expressed
misgivings about the safety of the containers. Now would be a good time
to speak up...
****************************************************************
From: Randy Brukardt
Sent: Saturday, October 2, 2004 7:37 PM
Now having actually tried to write the wording, I've concluded my statement
above is incorrect. We have to have an erroneous wording in any case, to
cover for the possibility of someone calling Finalize or
Unchecked_Deallocation on the container. While that is as likely as Pam
Anderson marrying me, we have to cover it, and if it happened, the parameter
to Process would be unstable as the memory would be possibly reused. OTOH,
detecting deletions of the element itself can be done fairly cheaply, and
would catch virtually all of the real problems.
I'll include a complete write-up in the "list of changes" document for the
update.
****************************************************************
From: Matthew Heaney
Sent: Tuesday, October 5, 2004 9:44 AM
A question about the exact semantics of the cursor-based Swap operation
for vectors and lists came up during my review of the Madison API.
The operation looks like this:
procedure Swap (I, J : Cursor);
The question was whether cursors I and J are allowed to swap elements
from different containers.
I think the answer is no, it's not allowed, and Program_Error is raised
if you try. But I'm not sure, so I wanted to ask for clarification.
The reason the question came up is that the semantics of Swap are
defined as follows:
procedure Swap (I, J : Cursor) is
EI : constant Element_Type := Element (I);
begin
Replace_Element (I, Element (J));
Replace_Element (J, EI);
end;
There's nothing in this algorithm that would prohibit I and J from
designating elements in different containers.
****************************************************************
From: Pascal Leroy
Sent: Tuesday, October 5, 2004 10:19 AM
The minutes are very clear, P_E is raised.
****************************************************************
From: Matthew Heaney
Sent: Tuesday, October 5, 2004 9:28 AM
I just finished reviewing the changes to the AI-302 draft that we made
in Madison, and I had a question that Randy suggested I post to the ARG
discussion list.
I didn't think of it during the meeting, but the C++ STL (on which
AI-302 is largely based) uses the name "reserve" for the operation we
named Ensure_Capacity.
We already have a function "Capacity" that has the same name and
semantics as the "capacity" vector member function in the STL.
The operation "Ensure_Capacity" has the same semantics as the "reserve"
member function in the STL, but it has a different name.
My question was whether the name "Reserve" would have been a better name
than "Ensure_Capacity", if only to avoid any unnecessary differences
between to the two APIs. Any opinions?
****************************************************************
From: Robert Dewar
Sent: Tuesday, October 5, 2004 9:30 AM
I am in favor of changing the name to Reserve.
Ensure_Capacity is a bit odd anyway. I certainly don't call a hotel
to ensure capacity for my upcoming stay :-)
****************************************************************
From: Pascal Leroy
Sent: Tuesday, October 5, 2004 10:09 AM
But with this analogy Ensure_Capacity is better, because the semantics are
really that the hotel builds new room if you come with many friends.
I could go either way, but I have a preference for having the word
capacity somewhere in the name of this operation. Otherwise it is not
obvious that Reserve and Capacity are related.
****************************************************************
From: Matthew Heaney
Sent: Tuesday, October 5, 2004 10:24 AM
But you can always say at the point of call:
Reserve (V, Capacity => N);
****************************************************************
From: Randy Brukardt
Sent: Tuesday, October 5, 2004 10:47 AM
When Matt suggested this to me, I thought he meant
Reserve_Capacity
which seemed better (at least on first thought) than Ensure_Capacity. But I
don't like "Reserve" by itself, either.
****************************************************************
From: Robert I. Eachus
Sent: Tuesday, October 5, 2004 4:00 PM
I think that makes more sense. Reserve can either be a noun or a verb.
Reserve_Capacity isn't completely unambiguous, but it is unlikely anyone
will misunderstand the intent.
****************************************************************
From: Pascal Leroy
Sent: Wednesday, October 6, 2004 1:55 AM
I agree that Reserve_Capacity is better than Ensure_Capacity.
****************************************************************
From: Matthew Heaney
Sent: Monday, October 4, 2004 7:03 PM
The vectors package has operations like:
function To_Vector (Count : Count_Type)
return Vector;
function To_Vector (Item : Element_Type; Count : Count_Type)
return Vector;
function "&" (L, R : Vector) return Vector;
...etc
The set package as operations like:
function Union (L, R : Set) return Set;
function Intersection (L, R : Set) return Set;
...etc
That is, each of these tagged types has primitive operations that return the
type. (I refer to functions that return the type as "constructors", or
"ctors" for short.) This means that during a derivation, the derived type
must either be declared as abstract, or the constructors must be overridden.
This is fine if your derivation is public, and the derived type is intended
to be used as a member of that class:
with Integer_Vectors;
package P is
type T is new Integer_Vectors.Vector with private;
...
end P;
In this case, you probably wouldn't object too much to overriding the ctors.
However, what I often do is to implement the full view of a type as a
private derivation, like this:
package Q is
type T is private;
...
private
package Integer_Vectors is new Vectors (Integer);
type T is new Integer_Vectors.Vector with null record;
end Q;
The issue here is that I am forced to override the vector ctors that were
inherited by T. But here I don't really care about those operations, which
aren't used to implement T, so no one will call them anyway.
I don't know whether this is an issue. After all, you can implement the
full view of T as a record, and declare V as a component.
One possibility is to arrange for the ctors to be non-primitive, something
like:
generic
...
package Vectors is
type Vector is tagged private;
package Constructors is
function To_Vector (Count : CT) return Vector;
function To_Vector (Item : ET; Count : CT) return Vector;
...
end Constructors;
...
end Vectors;
In the declaration above, the constructor operations aren't primitive for
type Vector, and so aren't inherited. However, this does mean that to use a
constructor, you have to make those operations visible:
declare
V1 : Vector := Constructors.To_Vector (N);
use Constructors;
V2 : Vector := To_Vector (N);
begin
...
end;
The issue was on my mind, and so I just wanted to see whether anyone else
had an opinion on the matter.
****************************************************************
From: Tucker Taft
Sent: Tuesday, October 4, 2004 7:58 PM
I agree that sometimes it makes more sense to make
constructors non-primitive. Putting the constructor
operations in a child package called "Factory" or
equivalent is something we do here pretty frequently.
However, I wouldn't consider "&" and Union/Intersection
to be constructors. They are operators, and they
have operands of the type. I think the kind
of constructors that belong in a factory are
typically very closely tied to the underlying
implementation of the type, rather than to its
abstract interface. To_Vector is somewhat a borderline
case, since it makes sense almost independently of
the way the vector is implemented. On the other
hand, if there were a constructor that took the
initial capacity, and perhaps a specification of
how much to expand on each expansion, etc., that
would seem very closely tied to a particular
implementation, and would belong in a factory
child, or equivalent.
****************************************************************
From: Pascal Leroy
Sent: Tuesday, october 5, 2004 2:47 AM
> In this case, you probably wouldn't object too much to
> overriding the ctors.
The constructors are fine as they are, i.e., they should be primitive.
I think the example above is the important one, and we want it to work
right. If we didn't expect users to extend the containers, we wouldn't
have made them visibly tagged, right?
If To_Vector is not primitive, you have no way to create a T with Count
elements. In particular, you cannot use an extension aggregate, because T
is a private extension. If you believe that To_Vector is generally useful
(and you do) then there is no reason why it wouldn't be useful for T, too.
And yes, you'll have to override it, but that's no big deal.
> However, what I often do is to implement the full view of a
> type as a private derivation, like this:
>
> package Q is
> type T is private;
> ...
> private
>
> package Integer_Vectors is new Vectors (Integer);
>
> type T is new Integer_Vectors.Vector with null record; end Q;
>
> The issue here is that I am forced to override the vector
> ctors that were inherited by T. But here I don't really care
> about those operations, which aren't used to implement T, so
> no one will call them anyway.
You tell me that the constructors aren't used to implement T, but I have
no reason to believe you. I can imagine many reasons why you would want
to call some of the constructors for T in the body of Q. For instance, to
implement "&", you could first call To_Vector to create an empty vector to
contain the result of the catenation.
If you know that you will never call a given constructor (including by
dispatching) then you can probably write a 2-line body that returns a
dummy value. No big deal.
Also remember that in general we want controlling results to work right.
It would be very confusing if (as suggested by Tuck) some functions were
primitives and others were not. This would be likely to cause mysterious
Tag_Errors.
****************************************************************
From: Florian Weimer
Sent: Tuesday, October 5, 2004 4:32 AM
> I think the example above is the important one, and we want it to work
> right. If we didn't expect users to extend the containers, we wouldn't
> have made them visibly tagged, right?
I thought the main motivation for making them tagged was to enable the
industry-standard method invocation syntax for them.
****************************************************************
From: Pascal Leroy
Sent: Wednesday, October 6, 2004 1:55 AM
This was not the "main motivation", although this was certainly one aspect
discussed when this decision was made in Palma. The motivation was that
tagged types are more flexible in many respects (in particular, guess
what, you can extend them) and since the implementation has to be a
controlled type anyway, we might as well expose the tagged-ness to the
user.
With the addition of interfaces, I would actually expect that mixins
involving containers and user-defined interfaces would be quite common in
programs making heavy use of the OOP paradigm.
****************************************************************
From: Matthew Heaney
Sent: Wednesday, October 6, 2004 12:25 PM
The reasoning was that since these are tagged anyway (because the type
must privately derive from Controlled), then we might as well make them
publicly tagged.
This has benefits besides allowing type extension, for example, tagged
type subprogram parameters are implicitly aliased, and you can use
distinguished-receiver syntax.
However, there is a cost, and that is when a derivation occurs, you must
override all the primitive functions that return the type.
This is a pain, when you simply want the convenience of implementing
some other type as a private derivation from vector (say):
package P is
type T is private;
...
private
type T is new Vector_Types.Vector with null record;
--must override To_Vector, etc
end P;
The locution above is a common Ada idiom (in fact, it's how all the
containers are implemented).
In this case, however, the convenience of deriving from Vector is
outweighed by having to override 6 vector functions, none of which are
needed to implement T. It is in this sense that making the vector type
tagged isn't free.
I support the decision to make the containers tagged (for the two
benefits I list above), but my concern is the cost of derivation. But
maybe container derivation isn't common enough to worry about.
You can eliminate this cost by making functions that return the type
non-primitive, but that of course has its own costs. If you're doing
any kind of polymorphic programming, you wouldn't be able to dispatch on
the tag of the function result. But then again, polymorphic programming
of containers seems a little bizarre, so maybe it isn't common enough to
worry about.
> With the addition of interfaces, I would actually expect that mixins
> involving containers and user-defined interfaces would be quite common in
> programs making heavy use of the OOP paradigm.
I am skeptical. In the vast majority of cases, the type of the
container is known statically. I can't imagine why anyone would ever
need a polymorphic class of, say, integer vectors. Any mixing of
containers can be done entirely using static mechanisms (cursors and
iterators).
****************************************************************
From: Randy Brukardt
Sent: Wednesday, October 6, 2004 12:40 PM
> I support the decision to make the containers tagged (for the two
> benefits I list above), but my concern is the cost of derivation. But
> maybe container derivation isn't common enough to worry about.
Well, if the containers weren't tagged, then there wouldn't be any (useful)
derivation. So this is only saying that it isn't quite as easy to derive as
we might like. The choice really seems to be between not allowing derivation
at all or having it be more painful that we'd like.
But moving various operators into a child package (which would require a
separate instantiation) or a nested package seems weird. That's especially
true for sets; do we really want "Union" to be non-primitive? (And then
again you couldn't use the prefix notation for calls.)
****************************************************************
From: Matthew Heaney
Sent: Wednesday, October 6, 2004 1:02 PM
> Well, if the containers weren't tagged, then there wouldn't be any (useful)
> derivation.
I gave an example of what I consider to be a very useful derivation:
package P is
type T is private;
...
private
type T is new Vector_Types.Vector; --no extension if not tagged
end P;
I bring this up because I actually attempted to declare a type as above,
but ended up abandoning that approach when I was forced to override the
primitive functions.
But it's no big deal, since I was able to solve the problem another way.
****************************************************************
From: Nick Roberts
Sent: Tuesday, October 5, 2004 11:58 AM
Would it make sense now to allow conversion towards an extension type?
A conversion:
TT(X)
where X was an object of tagged type T, and T was an ancestor type of TT,
could be defined as being equivalent to:
TT'(X with others => <>)
thus requiring any extension components to have default intialisations.
It would then be possible to rescind the rule that a primitive function
with a controlling result is abstract or must be overridden [RM2K 3.9.3
(4-6)]. Well, I guess so, anyway. This was always a bit of an awkward rule,
wasn't it?
****************************************************************
From: Robert A. Duff
Sent: Tuesday, October 5, 2004 1:15 PM
> Would it make sense now to allow conversion towards an extension type?
Sounds dangerous, to me.
> A conversion:
>
> TT(X)
>
> where X was an object of tagged type T, and T was an ancestor type of TT,
> could be defined as being equivalent to:
>
> TT'(X with others => <>)
>
> thus requiring any extension components to have default intialisations.
There's no such requirement, as far as I know. That is, "with others =>
<>" means "use the default, if any". For integer components with no
explicit ":=...", it means "default initialize it to any old garbage".
So there's some value in requiring an explicit "<>" when that is what is
wanted.
Besides, downward conversions are view conversions, and are allowed only
for class-wide operands, and there's a tag check. Making TT(X)
equivalent to an extension aggregate doesn't fit in well with that.
> It would then be possible to rescind the rule that a primitive function
> with a controlling result is abstract or must be overridden [RM2K 3.9.3
> (4-6)]. Well, I guess so, anyway. This was always a bit of an awkward rule,
> wasn't it?
I wouldn't say it's awkward -- it's necessary, to make sure the
extension components are not forgotten. I suppose it might make sense
to rescind that rule when the extension is "with null record". I think
we considered that during the Ada 9X design, but decided it wasn't
worthwhile to have such a special case.
****************************************************************
From: Nick Roberts
Sent: Wednesday, October 6, 2004 5:50 AM
Robert A Duff wrote:
>>Would it make sense now to allow conversion towards an extension type?
>
> Sounds dangerous, to me.
> ...
> Besides, downward conversions are view conversions, and are allowed only
> for class-wide operands, and there's a tag check. Making TT(X)
> equivalent to an extension aggregate doesn't fit in well with that.
I actually agree, on reflection.
> I wouldn't say it's awkward -- it's necessary, to make sure the
> extension components are not forgotten. I suppose it might make sense
> to rescind that rule when the extension is "with null record". I think
> we considered that during the Ada 9X design, but decided it wasn't
> worthwhile to have such a special case.
I think maybe it would be worthwhile, on the grounds that it isn't, in
fact, such a special case. Perhaps the ARG should consider this again?
****************************************************************
From: Tucker Taft
Sent: Wednesday, October 6, 2004 10:16 AM
> I think maybe it would be worthwhile, on the grounds that it isn't, in
> fact, such a special case. Perhaps the ARG should consider this again?
Even for a null extension, there might be additional operations
which are not present in the dispatch table of the type of
the object. This really doesn't work. What you want is
an extension aggregate.
****************************************************************
From: Randy Brukardt
Sent: Wednesday, October 6, 2004 12:58 PM
I've posted the updated AI-302-3 (Containers) to the web site. Find it
through
http://www.ada-auth.org/ais.html
(the file name is AI-20302.TXT) [This is version /07 of the AI - ED]
There is a list of changes beyond those discussed in Madison at the end of
the !appendix section (the very end of the AI file).
Comments are welcome, but keep in mind that we're planning to approve the AI
at the meeting next month (else it may not make the Amendment). So major
overhauls aren't practical. We're just looking to improve the details at
this point.
****************************************************************
From: Jeff Carter
Sent: Wednesday, October 6, 2004 1:49 PM
I have a minor complaint about the ordering of operations in the package
specifications. When using a container, I generally want to know how to
create one, how to put things in it, how to access things that are in
it, and how to delete things from it. The next most common thing is to
make it empty. Therefore, I think these operations should come first in
the specs. Right now they tend to be scattered around, separated by less
common operations. For example, looking at vectors, is "&" really more
common than Insert, Update, Replace (these 2 seem to be the same, so I
don't know why the names differ), and Delete?
One also needs to be able to determine locations within containers
(cursors and indices for vectors) in order to do these common
operations, so operations that provide locations should also be among
the first in the spec.
The container library proposed seems to be significantly different from
the STL, not to mention much smaller. The references to the STL in the
AI do not seem to add anything, and should probably be eliminated.
****************************************************************
From: Tucker Taft
Sent: Thursday, October 7, 2004 5:59 AM
I hate to weigh in on this now, but...
It is a fairly common paradigm to instantiate a package
and then use derivation to bring the type into the
current scope. E.g:
package T_Vecs is new Vectors(T);
type T_Vec is new T_Vecs.Vector;
Making the type tagged and giving it various operations
that are functions returning the type does defeat this
approach (since "with null record;" wouldn't work without
having to override all of the functions).
There seem to be a few alternatives to deal with this:
a) live with it as is
b) define "type NT is new T with null record;" to
provide default implementations of such functions
by implicitly providing an extension aggregate at the
point of call, e.g.: "Union(X,Y)" for NT is equiv to
"NT'(Union(T(X),T(Y)) with null record)"
c) make types untagged, and support object.op syntax
on untagged record and private types.
I kind of like option (b) as we don't want to force the
use of untagged types to enable this paradigm.
[ASIDE: This issue also reminds me of the problem we never fixed:
type T is private;
function "+"(X, Y: T) return T;
private
type T is new Integer;
function "+"(X, Y: T) return T renames <>; -- inventing here
It is pretty often that you want a private type to expose
some but not all of the operations of the full type.
Some kind of renaming would be great. Right now,
you have to write wrappers for each such operation,
which is a bit of a pain. End of ASIDE.]
****************************************************************
From: Nick Roberts
Sent: Thursday, October 7, 2004 8:57 AM
This does all seem to be suggesting the introduction of a new form of
declaration, the 'default completion'.
default_completion ::=
subprogram_specification [IS subprogram_default] ;
obviously similar to a formal subprogram declaration. This declaration
would be allowed anywhere a subprogram body is allowed, and would form the
completion of a subprogram, declared in the visible part of a package,
which is a primitive operation of a private type T. The type declaration in
the private part of the package must be a type derivation declaration; let
the type from which it is derived be called P.
The subprogram's body would be formed from the body of the corresponding
operation of P, with every parameter of type P converted to T, and each
other occurrance of P replaced by T. The conversion would be a view
conversion for non-tagged types, and a conversion of the form Tuck
suggested for a tagged type which added no extra components.
Default completions would be disallowed for a tagged type which did add
components.
The idea is that we could derive from a container thus:
package Foo is
package T_Vecs is new Ada.Containers.Vectors(T);
type T_Vec is private;
function Length (Container : T_Vec) return Count_Type;
function Is_Empty (Container : T_Vec) return Boolean;
procedure Clear (Container : in out T_Vec);
procedure Append (Container : in out T_Vec;
New_Item : in T_Vec);
... -- other vector operations we want to expose
private
type T_Vec is new T_Vecs.Vector with null record;
...
end;
package body Foo is
...
function Length (Container : T_Vec) return Count_Type is <>;
function Is_Empty (Container : T_Vec) return Boolean is <>;
procedure Clear (Container : in out T_Vec) is <>;
procedure Append (Container : in out T_Vec;
New_Item : in T_Vec) is <>;
...
end Foo;
We still have to explicitly declare the operations we wish to inherit, and
their completions (in the package body), but at least the form of the
completions is succinct and clear (making it explicit that the operations
are direct copies).
We could also have:
package Bar is
type T is private;
function "+"(X, Y: T) return T;
...
private
type T is new Integer;
...
end;
package body Bar is
function "+"(X, Y: T) return T is <>;
...
end Bar;
****************************************************************
From: Dan Eilers
Sent: Thursday, October 7, 2004 4:35 PM
> [ASIDE: This issue also reminds me of the problem we never fixed:
>
> type T is private;
> function "+"(X, Y: T) return T;
> private
> type T is new Integer;
> function "+"(X, Y: T) return T renames <>; -- inventing here
>
> It is pretty often that you want a private type to expose
> some but not all of the operations of the full type.
> Some kind of renaming would be great. Right now,
> you have to write wrappers for each such operation,
> which is a bit of a pain. End of ASIDE.]
I am in favor of fixing this problem, having seen real customer
code that did this. Note that there is a hazard in trying to write
the wrapper workaround, in that it is easy to accidentally infinitely
recurse.
****************************************************************
From: Dan Eilers
Sent: Thursday, October 7, 2004 4:57 PM
> Making the type tagged and giving it various operations
> that are functions returning the type does defeat this
> approach (since "with null record;" wouldn't work without
> having to override all of the functions).
>
> There seem to be a few alternatives to deal with this:
...
Another alternative might be to allow type renaming:
package T_Vecs is new Vectors(T);
type T_Vec renames T_Vecs.Vector;
with the semantics that you want.
****************************************************************
From: Martin Krischik
Sent: Friday, October 8, 2004 4:01 AM
I allwas though that typerenaming where not added because subtypes do the
same.
subtype T_Vec is T_Vecs.Vector;
Mind you: In my fist Ada month I did actually try to rename types I wonder why
it was not possible. Then the textbook told me that "subtype" is the thing to
do.
> with the semantics that you want.
Saying the subtypes and typerenaming are supposed to be sematicly the same I
wonder if typerenaming should not be allowed as a syntax option since it
would be often closed to want the programmer wants to express.
****************************************************************
From: Tucker Taft
Sent: Friday, October 8, 2004 10:24 PM
Type renaming creates a realm of thorny issues, relating
for example, to where primitives are (re)declared.
I don't believe this is a time to open up discussion of
a completely new syntactic feature.
On the other hand, I think we do need to be sensitive
to whether there are some minor "tweaks" of the various
proposals that will make them work better together.
****************************************************************
From: Christoph Grein
Sent: Thursday, October 7, 2004 6:24 AM
> b) define "type NT is new T with null record;" to
> provide default implementations of such functions
> by implicitly providing an extension aggregate at the
> point of call, e.g.: "Union(X,Y)" for NT is equiv to
> "NT'(Union(T(X),T(Y)) with null record)"
I like this proposal.
> [ASIDE: This issue also reminds me of the problem we never fixed:
>
> type T is private;
> function "+"(X, Y: T) return T;
> private
> type T is new Integer;
> function "+"(X, Y: T) return T renames <>; -- inventing here
>
> It is pretty often that you want a private type to expose
> some but not all of the operations of the full type.
> Some kind of renaming would be great. Right now,
> you have to write wrappers for each such operation,
> which is a bit of a pain. End of ASIDE.]
And I have often grumbled about this, too, and like Tuck's invention, which
is fully upward compatible.
But I fear it's too late for Ada0Y.
****************************************************************
From: Nick Roberts
Sent: Thursday, October 7, 2004 10:55 AM
I wrote:
> default_completion ::=
> subprogram_specification [IS subprogram_default] ;
Obviously I should have written:
default_completion ::=
subprogram_specification IS subprogram_default ;
****************************************************************
From: Matthew Heaney
Sent: Monday, October 25, 2004 11:36 AM
Tucker Taft wrote:
>
> It is a fairly common paradigm to instantiate a package
> and then use derivation to bring the type into the
> current scope. E.g:
>
> package T_Vecs is new Vectors(T);
> type T_Vec is new T_Vecs.Vector;
>
> Making the type tagged and giving it various operations
> that are functions returning the type does defeat this
> approach (since "with null record;" wouldn't work without
> having to override all of the functions).
I have been corresponding with someone who is teaching a class in data
structures. He was trying to use the vector container to implement a stack:
generic
type ET is private;
with function "=" (L, R : ET) is <>;
packcage Stacks is
type Stack is private;
procedure Push (Container : in out Stack;
New_Item : in ET);
procedure Pop (Container : in out Stack);
private
package ET_Vectors is
new Ada.Containers.Vectors (Positive, ET, "=");
type Stack is
new ET_Vectors.Vector with null record;
--won't compile as is
end Stacks;
He was confused about the compiler error messages, stating that he had
to override the function To_Vector, etc. (He knows Ada83, but he's
still learning Ada95.)
I bring this up as a real-life example of the fact that private
derivation is a very natural Ada idiom, since to him that was the most
obvious solution.
****************************************************************
From: Ehud Lamm
Sent: Monday, October 25, 2004 11:56 AM
When I first started doing this sort of thing in Ada I was confused myself,
as I am pretty sure my studetns will be.
Alas, I don't see a good solution. The child package approach seems to be as
confusing, if not more.
****************************************************************
From: Dan Eilers
Sent: Monday, October 25, 2004 12:21 PM
Isn't "type renaming" exactly the solution you're looking for?
It seems you don't really want to make Stack a new type derived from
ET_Vectors.Vector, instead you want to say that Stack is implemented by,
or in other words, renames ET_Vectors.Vector.
****************************************************************
From: Matthew Heaney
Sent: Saturday, October 9, 2004 12:47 AM
I just had a quick(?) question about dope vectors for arrays. Suppose I
have this declaration:
declare
type String_Access is access all String;
S : aliased String (1 .. 10);
X : String_Access := S'Access; --not legal Ada95
begin
The declaration of X is illegal, since S doesn't have a dope vector. I can
do this:
declare
type String_Access is access all String;
S : aliased String := String'(1 .. 10 => ' ');
X : String_Access := S'Access; --OK
begin
Was this behavior liberalized in Ada 2005? Is there a way for the
programmer to say: "give me a dope vector for this array object", so that
the former declaration would be legal?
What about a record component:
type RT (N : Natural) is record
S : aliased String (1 .. N);
end record;
R : RT;
X : String_Access := R.S'Acccess;
I was thinking about the functions-that-return-anonymous-access-types, to
handle this case:
type T is limited private;
function S (O : access T) return access String;
private
type T is limited record
S : aliased String (1 .. 10); --or maybe this is a discriminant
end record;
function S (O : access T) return access String is
begin
return O.S'Access;
end;
...
declare
O : aliased T;
SS : String renames S (O'Access).all;
begin
Just curious...
****************************************************************
From: Tucker Taft
Sent: Saturday, October 9, 2004 11:19 AM
> Was this behavior liberalized in Ada 2005?
No.
> ... Is there a way for the
> programmer to say: "give me a dope vector for this array object", so that
> the former declaration would be legal?
No.
>
> What about a record component:
>
> type RT (N : Natural) is record
> S : aliased String (1 .. N);
> end record;
>
> R : RT;
> X : String_Access := R.S'Acccess;
No. You can't get there from here.
> ...
> Just curious...
Good question, but we didn't make fixing this a priority.
There is no obvious fix other than to say that all
aliased arrays must have dope vectors pre-allocated in
a way that would allow an access-to-unconstrained pointer
to point at them. That probably would have been the
right answer in Ada 95, in retrospect, but changing it
now could break some working code in bizarre ways,
as it would require a change in representation for
existing data types.
****************************************************************
From: Matthew Heaney
Sent: Thursday, October 21, 2004 11:28 PM
!standard A.17 04-10-04 AI95-00302-03/07
!subject Container library
My review of the post-Madison AI-302 draft follows.
As usual, each comment is bracketed with "MJH:" and "ENDMJH." pairs, and
immediately follows the text to which it refers.
I haven't copied the entire AI draft here. Rather, I give just enough
context to determine the relevant section.
I can summarize most of my comments as:
(1) If we decide to keep the new cursor-based replace operation for
sets, then it should be named "Replace_Element", not "Replace".
(2) We need to get rid of the key-based replace operation for sets, the
operation named "Replace" in the nested package Generic_Keys.
(3) The set operation named "Checked_Update_Element" declared in the
nested package Generic_Keys should be named just "Update_Element".
(4) We need to get rid of the requirement that an implementation detect
container modification while passive iteration is in progress. This
requirement is unnecessary since we already have a meta-rule that
says container behavior is unspecified if a container object is
simultaneously read from and written to. This rule of course
applies even if it's the same task doing the reading and writing.
(5) This API needs to state unambigously that a container implementation
must support multiple tasks simultaneously reading from a container
object. In particular it is perfectly legal for multiple tasks to
simulaneously perform passive iteration over a container.
(6) This API needs(?) to state that there are no container operations
that are "potentially blocking."
A.17 Containers
...
Note that the language already includes several requirements that are
important to the use of containers. First, library packages must be
reentrant - multiple tasks can use the packages as long as they operate on
separate containers. Thus, it is only necessary for a user to protect a
container if a single container needs to be used by multiple tasks.
MJH:
We need to be clear here about multithreading issues, since that last
sentence is wrong.
The only problem case is when there are multiple writers, or a single
writer and one or more readers. (The reader and writer can also be the
same task.)
It is definitely *not* an error for multiple readers to access the same
container all simultaneously.
In particular, it is perfectly acceptable (in fact, the API is designed
to facilitate this) for multiple tasks to be iterating over a same
container object, using either cursors or the passive iterator.
We already have a rule that says container behavior is not defined when
a container is simultaneously written to and read from. The rule
applies whether it is one task or more than one task.
ENDMJH.
Second, the language requires that language-defined types stream "properly".
That means that the stream attributes can be used to implement persistence
of containers when necessary, and containers can be passed between
partitions of a program.
...
A.17.2 The Package Containers.Vectors
...
package Ada.Containers.Vectors is
...
function To_Vector (Length : Count_Type) return Vector;
function To_Vector
(New_Item : Element_Type;
Length : Count_Type) return Vector;
function "&" (Left, Right : Vector) return Vector;
function "&" (Left : Vector;
Right : Element_Type) return Vector;
function "&" (Left : Element_Type;
Right : Vector) return Vector;
function "&" (Left, Right : Element_Type) return Vector;
MJH:
We have already discussed the fact making these functions primitive
means that they must be overridden during a derivation, since the
function return type is Vector. (There is a similar issue for sets.)
This is kind of a pain, since it's very common to implement the full
view of a private type as a null extension of some other (tagged) type,
or to derive from a type in order to bring its primitive operations into
local scope.
We can either live with this feature, arrange to make these operations
non-primitive, or modify the language such that, say, a null extension
inherits a default implementation of these functions.
ENDMJH.
...
procedure Delete (Container : in out Vector;
Index : in Extended_Index;
Count : in Count_Type := 1);
MJH:
See my comments below about the subtype and semantics of parameter Index
for the index-based Delete operation.
ENDMJH.
...
generic
with function "<" (Left, Right : Element_Type)
return Boolean is <>;
procedure Generic_Sort (Container : in Vector);
MJH:
Another operation that might be useful is a binary search over a sorted
vector:
generic
with function "<" (Left, Right : Element_Type)
return Boolean is <>;
function Generic_Binary_Search (Container : Vector;
Item : Element_Type)
return Extended_Index;
(It's just an idea...)
ENDMJH.
...
procedure Delete (Container : in out Vector;
Index : in Extended_Index;
Count : in Count_Type := 1);
If Count is 0, the operation has no effect. If Index does not specify a
value in the range First_Index (Container) .. Last_Index (Container), then
Constraint_Error is propagated. Otherwise Delete slides the active
elements (if any) starting Index plus Count down to Index. Any
exceptions raised during element assignment are propagated.
MJH:
The semantics wrt the Index parameter are arguably inconsistent with the
semantics of the cursor-based delete. Any index value outside of the
range IT'First .. C.Last is technically the same as "not Has_Element",
so you could make an argument that it should be treated the same as the
cursor-based delete (meaning that it should be a no-op).
ENDMJH.
...
procedure Swap (I, J : in Cursor);
If either I or J is No_Element, then Constraint_Error is propagated. If I
and J
designate elements in different containers, then Program_Error is
propagated.
Otherwise Swap exchanges the values of the elements designated by I and J.
MJH:
The ARG needs to confirm whether the second sentence is really correct.
The semantics of Swap are equivalent to:
procedure Swap (I, J : Cursor) is
EI : constant ET := Element (I);
begin
Replace_Element (I, By => Element (J));
Replace_Element (J, By => EI);
end;
There's nothing here that would preclude I and J from designating
elements in different containers, so it's not clear why this an error.
ENDMJH.
...
procedure Iterate
(Container : in Vector;
Process : not null access procedure (Position : in Cursor));
Invokes Process.all with a cursor that designates each element in Container,
in
index order. Any exception raised by Process is propagated.
Program_Error is propagated if:
* Process.all attempts to insert or delete elements from Container; or
* Process.all finalizes Container; or
* Process.all calls Move with Container as a parameter.
AARM Note:
This check takes place when the operations that insert or delete elements,
etc.
are called. There is no check needed if an attempt is made to insert or
delete
nothing (that is, Count = 0 or Length(Item) = 0).
The check is easy to implement: each container needs a counter. The counter
is incremented when Iterate is called, and decremented when Iterate
completes.
If the counter is nonzero when an operation that inserts or deletes is
called,
Finalize is called, or one of the other operations in the list occurs,
Program_Error is raised.
Swap and Generic_Sort are not included here, as they only copy elements.
End AARM Notes.
MJH:
Au contraire: the counter-based check described above isn't adequate for
detecting container modification during passive iteration, since it
won't work in the presence of multiple reader tasks.
We already have a rule that says container behavior is undefined if a
container is simultaneously read from and written to. This rule applies
even if it's the same task doing the simultaneous reading and writing.
Hence the requirement above is entirely superfluous, and therefore it
should be removed.
A cursor does not need to confer any safety benefits beyond what an
access type provides. (This is especially true for a vector, which
implements a cursor as a wrapper around an index.)
ENDMJH.
...
A.17.3 The Package Containers.Doubly_Linked_Lists
...
procedure Swap (I, J : in Cursor);
If either I or J is No_Element, then Constraint_Error is propagated. If I
and J
designate elements in different containers, then Program_Error is
propagated.
Otherwise Swap exchanges the values of the elements designated by I and J.
AARM Notes:
After a call to Swap, I designates the element value previously
designated by J, and J designates the element value previously
designated by I. The cursors do not become ambiguous from this operation.
AARM Notes: To Be Honest: The implementation is not required to actually
copy the elements if it can do the swap some other way. But it is allowed
to copy the elements if needed.
MJH:
The ARG needs to confirm the behavior when I and J designate elements in
different containers. See my comment above for vectors.
ENDMJH.
...
procedure Iterate
(Container : in List;
Process : not null access procedure (Position : in Cursor));
Invokes Process.all with a cursor that designates each node in Container.
Any
exceptions raised during Process are propagated.
Program_Error is propagated if:
* Process.all attempts to insert or delete elements from Container; or
* Process.all calls a routine that reorders the elements of Container
(Swap_Links, Splice, Generic_Sort, or Generic_Merge); or
* Process.all finalizes Container; or
* Process.all calls Move with Container as a parameter.
AARM Note:
This check takes place when the operations that insert or delete elements,
etc.
are called. There is no check needed if an attempt is made to insert or
delete
nothing (that is, Count = 0).
See Iterate for vectors for a suggested implementation of the check.
Swap is not included here, as it only copies elements.
End AARM Notes.
MJH:
This requirement is redundant, since we already have a meta-rule that
says container behavior isn't specified if the container object is
simultaneously read from and written to. This rule applies even if it's
the same task doing the reading and writing (as would be the case when a
container is modified during passive iteration).
Note that the suggested implementation of the check doesn't work when
there are multiple reader tasks. A container must support simultaneous
reading by multiple tasks.
At the end of the day, it doesn't really matter whether the container is
modified during iteration, as long as next node can be reached safely,
and the iteration eventually terminates. This is especially true for a
list. If a user decides to sort the list during iteration, and then
iterate delivers items in some different order, then what's the problem?
(It's certainly not a problem in the reference implementation.)
The problem case is deleting nodes during iteration. If the user
deletes the current node, then the iterator might not be able to find
the next node (since the next node pointer was stored on the node
deleted). You can handle that by keeping a few nodes in cache, so that
a node retains its value after it has been deleted. (You could do this
using a special storage pool too, of course.)
Here's an example I presented to Randy to illustrate some of these
issues. Suppose the list contains items that are equivalent in some
way, and all the equivalent items are grouped together in sequence,
something like:
a a b b b c d d d d e e
Now we want to write a filter program, that removes all but the first
member of each like sequence:
a b c d e
You could implement such a filter this way:
procedure Filter (L : in out List) is
procedure Process (C : Cursor) is
D : Cursor;
begin
loop
D := Next (C);
if Has_Element (D)
and then Element (D) = Element (C)
then
Delete (L, D); --NOTE: DELETION
else
return;
end if;
end loop;
end Process;
begin
Iterate (L, Process'Access);
end Filter;
Even though this algorithm deletes nodes during passive iteration, it
works if Iterate is implemented this way:
procedure Iterate
(Container : in List;
Process : not null access procedure (Position : in Cursor)) is
Node : Node_Access := Container.First;
begin
while Node /= null loop
Process (Cursor'(Container'Unchecked_Access, Node));
Node := Node.Next;
end loop;
end Iterate;
It's perfectly safe here, since it only deletes nodes that follow the
current node. The real issue is portability, since it won't work if
Iterate is implemented this way:
procedure Iterate
(Container : in List;
Process : not null access procedure (Position : in Cursor)) is
Node : Node_Access := Container.First;
begin
while Node /= null loop
declare
Next : constant Node_Access := Node.Next;
begin
Process (Cursor'(Container'Unchecked_Access, Node));
end;
Node := Next;
end loop;
end Iterate;
The filter algorithm above would be erroneous in this case, since the
node designated by the Next pointer has been deleted.
Now I'm not saying that this API should support container modification
during passive iteration, since the filter algorithm above could be
implemented just as easily using the active iterator. But there is a
big difference between saying that we don't support it, and making a
requirement that the implementation must detect modification during
Iterate and raise an exception.
The moral of the story is that the only thing this API should say about
modification during iteration is that this API doesn't say what happens
when modification during iteration occurs.
ENDMJH.
...
A.17.4 The Package Containers.Hashed_Maps
...
generic
...
package Ada.Containers.Hashed_Maps is
...
function "=" (Left, Right : Map) return Boolean;
MJH:
We have been in discussion about adding another operation, called
Equivalent, for sets and maps. I'm not sure what such an operation
would mean for a map, but I just wanted to write down somewhere that
it's up for discussion in Atlanta.
ENDMJH.
...
procedure Replace_Element (Position : in Cursor;
By : in Element_Type);
...
procedure Replace (Container : in out Map;
Key : in Key_Type;
New_Item : in Element_Type);
MJH:
The introduction of a new set operation has alerted me to the fact that
we have two operations similarly named. (See the set spec for more
comments.)
So far we have named the cursor-based replace operation
"Replace_Element" and its element parameter "By" (this came from
Ada.Strings.*), and named the key-based replace operation "Replace" and
its element parameter "New_Item".
The ARG should confirm whether this difference in naming of the
cursor-based vs. key-based replace operations is intended.
ENDMJH.
...
procedure Iterate
(Container : in Map;
Process : not null access procedure (Position : in Cursor));
Iterate calls Process.all with a cursor that designates each node
in the Container. Any exception raised by Process is propagated.
Program_Error is propagated if:
* Process.all attempts to insert or delete elements from Container; or
* Process.all calls Reserve_Capacity; or
* Process.all finalizes Container; or
* Process.all calls Move with Container as a parameter.
AARM Note:
This check takes place when the operations that insert or delete elements,
etc.
are called.
See Iterate for vectors for a suggested implementation of the check.
We have to include Reserve_Capacity here, as rehashing probably will change
the order that elements are stored in the map.
End AARM Notes.
MJH:
We already have a rule that says behavior is unspecified when the
container is simultaneously read from and written to. That includes the
case of a single task that is both the reader and writer (as is the case
with Iterate).
Note that for a hashed container (as for a list) the only real problem
case is when the current node is deleted, since it contains the pointer
to the next node. (Note that if the deleted node retains its value
after has been deallocated or put in cache, then deleting the current
node isn't really a problem, since you would then be able to reach the
next node.)
It doesn't really matter if Move or Reserve_Capacity is called either,
since the worst thing that happens is you run off the end of the buckets
array, in which case Constraint_Error is propagated. A simple assertion
to check that the buckets index hasn't changed is all you would need to
detect calls to Reserve_Capacity, and a simple assertion check is all
you would need to detect whether the buckets array (pointer) has
changed as a result of Move or Reserve_Capacity.
ENDMJH.
...
A.17.5 The Package Containers.Ordered_Sets
...
generic
...
package Ada.Containers.Ordered_Sets is
...
function "=" (Left, Right : Set) return Boolean;
MJH:
As I mentioned above, we have been in discussion about adding an
operation called Equivalent, that is similar to "=" except that it
compares each element for equivalence (using "<") instead of element
"=".
The ARG should also decide whether to bring back the lexicographical
comparison operators for sets ("<", ">", etc), since Equivalent is
defined as "not (L < R) and not (R < L)".
ENDMJH.
...
procedure Replace (Container : in out Set;
New_Item : in Element_Type);
...
procedure Replace (Container : in Set;
Position : in Cursor;
By : in Element_Type);
MJH:
This new cursor-based replace operation for sets is named in a manner
inconsistent with the rest of this API. It should be named
Replace_Element. (Note that the element parameter is named By, not
New_Item. The parameter name By is always used as the element parameter
of the cursor-based operation named Replace_Element.)
ENDMJH.
...
generic
...
package Generic_Keys is
...
procedure Replace (Container : in out Set;
Key : in Key_Type;
New_Item : in Element_Type);
MJH:
This operation needs to be removed from this API.
First of all, there's no guarantee that parameter Key matches the
key-part of parameter New_Item. This is a set and so the ultimate
arbiter of position within the set is the key-part of the element
itself. In that case, you might was well just say:
Replace (Container => S, New_Item => E);
A key-based replace doesn't buy you anything, since the New_Item
parameter already has a key.
But let's suppose someone really wants to use a key-based replace
operation. That means there are two possibilities: either the Key
matches the key-part of New_Item, or it doesn't.
If it matches the key-part of New_Item (and Randy has stated this is the
normal case), then passing the key as a separate parameter is entirely
redundant.
If the key doesn't match, then you must search for the node that matches
Key, remove it from the set, assign the value of New_Item to the element
on that node, and then reinsert that node. But of course that
reinsertion might fail (since the key-part of New_Item) might already be
in the set), and if so raise P_E.
If you really want to change the key-part of an element, then you can do
that using the existing key-based Find and cursor-based Replace_Element
operations:
Replace_Element (S, Keys.Find (S, K), By => E);
However, even that's kind of dubious. That only thing Replace_Element
really saves is that the node doesn't have to be deallocated and then
reallocated. You might as well just say:
Delete (S, K);
Insert (S, E);
The bottom line is that Generic_Keys.Replace must be removed from this
API. It provides no new functionality, and only adds unnecessary
clutter.
ENDMJH.
...
procedure Checked_Update_Element
(Container : in out Set;
Position : in Cursor;
Process : not null access
procedure (Element : in out Element_Type));
MJH:
This operation should be named just "Update_Element", not
"Checked_Update_Element".
There's no need to say that an operation is "checked," since that's
implied. All operations are checked in some way or another.
It also doesn't need to be named "Checked...", since unlike the
Update_Element for other containers, this set operation has an extra
parameter for the container. Checking is implied by the extra
parameter.
Yet another clue that this Update_Element for sets is special is that it
is declared in a special place, inside Generic_Keys.
So saying that that this operation is "checked" doesn't tell you
anything that you don't already know. It just adds unnecessary
syntactic overhead.
Another issue is that it's inconsistent with Replace_Element. That
operation carries an extra container parameter too, yet it's not named
Checked_Replace_Element (or Checked_Replace). If it's obvious that
Replace_Element is checked, then it should also be obvious that
Update_Element is checked.
Please change the name of this operation to "Update_Element".
ENDMJH.
end Generic_Keys;
private
... -- not specified by the language
end Ada.Containers.Ordered_Sets;
...
procedure Iterate
(Container : in Set;
Process : not null access procedure (Position : in Cursor));
Invokes Process.all with a cursor that designates each element in Container.
Program_Error is propagated if:
* Process.all attempts to insert or delete elements from Container; or
* Process.all finalizes Container; or
* Process.all calls Move with Container as a parameter.
AARM Note:
This check takes place when the operations that insert or delete elements,
etc.
are called.
See Iterate for vectors for a suggested implementation of the check.
End AARM Notes.
MJH:
See my previous comments. There should be no requirement for a check to
determine whether a set has been modified during passive iteration.
As usual, if checks are desired then they should be enabled by an
assertion. The ordered set is interesting because (at least in the
reference implementation) it uses both looping and recursion to
implement passive iteration:
procedure Generic_Iteration (Tree : in Tree_Type) is
procedure Iterate (P : Node_Access) is
X : Node_Access := P;
begin
while X /= Null_Node loop
Iterate (Left (X));
Process (X);
X := Right (X);
end loop;
end Iterate;
begin
Iterate (Tree.Root);
end Generic_Iteration;
Again, it really doesn't matter much if the tree is modified, since
eventually the iteration terminates when it reaches the bottom of the
tree.
However, if tree modification is a concern then you can insert some
assertion checks (as many as the vendor feels is necessary):
procedure Iterate (P : Node_Access) is
X : Node_Access := P;
begin
while X /= Null_Node loop
pragma Assert (Is_Valid (Tree, X));
Iterate (Left (X));
pragma Assert (Is_Valid (Tree, X));
Process (X);
pragma Assert (Is_Valid (Tree, X));
X := Right (X);
end loop;
end Iterate;
The Is_Valid function would be defined something like this:
function Is_Valid
(Tree : Tree_Type;
Node : Node_Access) return Boolean is
begin
if Tree.Length = 0 then
return False;
end if;
if Tree.Root = Null_Node then
return False;
end if;
if Tree.First = Null_Node then
return False;
end if;
if Tree.Last = Null_Node then
return False;
end if;
if Parent (Tree.Root) /= Null_Node then
return False;
end if;
if Tree.Length > 1 then
null;
elsif Tree.First /= Tree.Last then
return False;
elsif Tree.First /= Tree.Root then
return False;
end if;
if Left (Node) = Null_Node then
null;
elsif Parent (Left (Node)) /= Node then
return False;
end if;
if Right (Node) = Null_Node then
null;
elsif Parent (Right (Node)) /= Node then
return False;
end if;
if Parent (Node) = Null_Node then
if Tree.Root /= Node then
return False;
end if;
elsif Left (Parent (Node)) = Node then
if Right (Parent (Node)) = Node then
return False;
end if;
elsif Right (Parent (Node)) /= Node then
return False;
end if;
return True;
end Is_Valid;
You get the idea. You can use this same technique for all the
containers, to check the validity of the cursor passed to any
cursor-based operation.
ENDMJH.
****************************************************************
From: Randy Brukardt
Sent: Monday, October 3, 2004 xx:xx PM
Here is a listing of the AI-302 updates that I made beyond those
discussed at the meeting. These mostly have come up in more recent
e-mail discussions.
3) Replace and Exclude operations matching the ones in the Ordered_Sets were
added to the Generic_Keys generic, as it is odd that Delete was in there
and
not the others.
MJH:
See my previous comments that the key-based Replace operation should not
be part of this API.
ENDMJH.
(The intent is that this package closely match Hashed_Maps
[and Ordered_Maps, if it ever is defined] - as many operations on keys in
Hashed_Maps should be represented here as possible.)
MJH:
The intent of Sets.Generic_Keys is *not* to make a set look like a map
(we have maps for that), but rather to allow key-based manipulation of a
set.
It is always the case that the identity of the container as a *set* is
preserved, even when using the operations in Generic_Keys. The nested
package is there to allow users to take advantage of composite element
types which have a distinctive key component.
ENDMJH.
Insert and Include were
omitted, as there could be no guarentee that the Key passed in matches
the
one in the Element passed in. (We could check, of course, but that seems
like going too far; moreover, it's hard to imagine how these could be
used.)
MJH:
But that's true of Replace as well.
ENDMJH.
Replace simply doesn't worry about it; it is defined in terms of Replace
(see below), replacing the element referred to by the Key. Thus it works
similarly to Checked_Update_Element.
MJH:
See my previous analysis. There is no reason to have a key-based
replace operation, since the addition of a cursor-based replace
operation obviates its need.
ENDMJH.
Replace (Container, Cursor, New_Item) also has been added to the Set
itself,
as there is not a Replace_Element for a set. This tries to replace in
place,
but will do an insert/delete if necessary.
MJH:
This operation should be named "Replace_Element", not "Replace".
ENDMJH.
5) The index forms of Element, Replace_Element, Query_Element, and
Update_Element took Index_Type'Base for some reason, but passing No_Index
raises Constraint_Error. So I changed these to Index_Type, so that the
specification doesn't allow No_Index to be passed.
MJH:
I originally used IT'Base because the implementation must check that the
index parameter satisfies the constraint that Index <= Last_Index (V),
so passing in a constrained subtype didn't really buy you anything.
ENDMJH.
6) Added an erroneous case for abuse of the Process procedure of
Query_Element
and Update_Element. This usually looks like:
Execution also is erroneous if the called Process procedure of a call to
Query_Element or Update_Element executes an operation that causes the
Position cursor of Query_Element or Update_Element to become invalid.
For lists, maps, and sets, the only problem occurs if the element is deleted
directly, or if the container is finalized (via Unchecked_Deallocation).
Insertions and other Deletions don't matter, as the nodes are logically
separate.
For vectors, the rule also includes ambiguous cursors. An insert or delete
to
the left of the cursor will move the elements; if the element is passed by
reference, that will clobber the element being operated on with unknown
effects. We don't want to require that optimization is off in Process
subprograms! The vector version also requires wording to cover the index
version of the routines.
I'd like to suggest that we consider adding a check that the element being
processed is not deleted by the Process procedure. This check requires only
a
bit per node (or a short list of elements in process), and covers all of the
new dangerous cases for most of the containers.
MJH:
See my previous analysis. This suggested implementation won't work in
the presence of multiple reader taaks, which a container must support.
We already have a rule that says simultaneous reading from and writing
to a container has undefined behavior. This rule applies whether it is
one task or multiple tasks doing the simultaneous reading and writing.
Hence there should be no requirement to perform any such checks.
ENDMJH.
(Bad use of
Unchecked_Deallocation is hardly new to the containers, and Move will not
actually cause problems in practice, as the nodes are not changed, just the
container that they belong to.) Deleting yourself requires contortions (the
Process routine does not have a cursor to use for this operation), and,
since
it damages the element parameter, the effects could be widespread. The check
also would prevent calling Update_Element on the same element again, which
would have different results depending on the parameter passing mode (and
which
makes the check cheaper). The overhead of the check would only apply to the
various Deletes and Update_Element; no other routines would need to check.
The
text would be:
If the Process procedure deletes the element designated by Cursor, or
calls
Update_Element on Cursor, Program_Error is raised.
AARM Note: This check has to be done in the code for Delete and
Update_Element,
of course.
Making vector Update_Element safe would also require checking for any
operations that would make the cursor ambigious. (That's a bounded error in
other cases.)
MJH:
See my previous comments. The checks described above are not
implementable.
ENDMJH.
8) Delete for cursors does nothing if the cursor is No_Element for Lists,
Maps,
and Sets. (Matt says this was intended to model the effect of
Unchecked_Deallocation.) Delete for cursors in Vectors, on the other
hand,
raised Constraint_Error in this case. I changed the wording for Delete
for
cursors in Vectors to be consistent with the other three.
MJH:
The reasons are historical. The index-based form of delete came first,
and then (as now), specifying an index value outside of the active range
of elements raised Contraint_Error. (The model is that a vector is
roughly the same as an array that can expand or contract.)
When we added the cursor-based operations in Phoenix, I probably defined
the semantics of the cursor-based delete operation for vector to match
the semantics of the index-based delete operation, rather than matching
the semantics of the cursor-based delete for other containers.
Note that I have a comment above (in the vectors section) requesting the
ARG to confirm the semantics of the index-based delete. (That operation
raises C_E if the index is outside of the active range of elements.)
ENDMJH.
10) Added an AARM note to the effect that when we say "unspecified" in this
clause (A.17), we don't mean "erroneous". If we meant "erroneous", we said
that. And included some ramifications of that (checking must not be
suppressed;
don't create dangling pointers by assuming behavior of generic formals).
MJH:
Maybe we should say that modifying a container during passive iteration,
or during Update_Element, etc, has "unspecified" behavior? I am not in
favor of requiring any sort of check (especially since the suggested
implementation won't work if there are multiple reader tasks).
ENDMJH.
15) Wording was added to Iterate for each container to say that
Program_Error
is raised if the Process routine calls an operation that will modify or
reorder
the container. Each container needs slightly different wording for various
reasons (nodes can be reordered in Lists; rehashing in a Map would change
the
order).
MJH:
This requirement should be removed. There is no way to make such a
check that works when there are multiple reader tasks.
This requirement is redundant anway, since we already have a meta-rule
that says behavior is unspecified if a container is simultaneously read
from and written to. This rule applies irrespective of the number of
tasks (including the case of the same task).
ENDMJH.
This decision grew out of a discussion between Matt and me as to what
exactly
the passive iterator should allow.
We both agreed that trying to implement a passive iterator that could stand
insertions and deletions of elements was hard.
MJH:
It varies by container. My comments above highlight some of the
differences among the containers. The problem case for a list, for
example, is deleting the current node; other than that, pretty much
anything goes.
At best, container modification during passive iteration is
non-portable. This API should not require that container modification
during passive iteration be allowed, but it should not require that
modification during passive iteration be prevented, either.
In particular, there should be no requirement to perform any check
during passive iteration to detect modification.
ENDMJH.
Morevoer, if the user needs to
do that, they can use an active iterator (that is, a loop with explicit
cursors) to do so. So, we agreed that inserting or deleting elements from
within a passive iterator was bad, and there is no need or intent support
it.
MJH:
Whether or not it's "bad" depends on the implementation. If you delete
the current node during passive iteration over a list, for example, then
very likely the implementation will read some deallocated memory to get
the pointer to the next node. That would clearly be bad, unless the
implementation stores deleted nodes in a cache, or arranges for
deallocated memory to retain its state (say, by using a special storage
pool), in which case there would be no problem.
On the other hand, if nodes other than the current node are deleted,
then there isn't any problem. (But again, it depends on the
implementation.)
So no, this API should not require implementations to support container
modification during passive iteration, but that's a far different thing
from requiring that an implementation prevent such modification.
There is a meta-rule that says the container must support manipulation
by multiple reader tasks. There is no way (that I know of, at least) to
check that a container isn't modified during passive iteration in a way
that doesn't violate this meta-rule. That's why there should be no
requirement for such a check, since it's impossible to implement.
ENDMJH.
The main undecided issue is what to do if the user does indeed make a
mistake
and insert or delete an element from the container during a passive
iterator.
There seem to be 4 possibilities:
1) Specified results (it works in some specified way);
MJH:
For the vector and list, you could probably do that without too much
implementation burden. (Well, I take that back. Randy and Pascal are
implementing their vectors using a two-tier structure and a skip list,
respectively, so I can't say what burden would entail.)
In any event, it's probably not worth the bother of specifying allowed
modifications during passive iteration, since the user can just manually
use a cursor and an explicit loop.
ENDMJH.
2) Unspecified results (it works, but what it does isn't specified);
MJH:
I'm not sure we can even guarantee that "it works," since this might
overly constrain an implementor (by requiring that he cache nodes, for
example).
ENDMJH.
3) Erroneous (anything goes);
MJH:
If a node gets deallocated, and you refer to that node in order to
navigate to some other node, then you have a dangling reference, so I
assume that falls under the heading of "erroneous."
ENDMJH.
4) Check for bad cases and raise an exception.
MJH:
This won't work if there are multiple reader taaks.
ENDMJH.
(1) is clearly too burdensome on the implementation, and besides, we don't
want it.
(2) would insure that the program wouldn't crash, but otherwise the results
wouldn't be portable.
(3) would allow anything, implementers could ignore the possibility.
(4) would be the most portable, but there are concerns about overhead.
MJH:
It's not even clear to me how (4) could even be implemented, since
there's no way to perform a check that works in the presence of multiple
reader tasks.
ENDMJH.
I originally wrote (2) using the wording: "Which cursors are presented to
Process is unspecified if..." But that seems to be a burden on
implementations
for little benefit.
I object to (3), because users *will* make this mistake, and likely
implementations of the iterators would have very bad effects. If the node
that
the iterator was holding onto was deleted, it probably would be
Unchecked_Deallocated, the memory might be reused, and when the pointers are
walked, just about anything could happen.
MJH:
Not necessarily. You yourself have already stated you plan on using a
node cache, which means nodes would retain their state. You could
arrange to put a newly-deleted node at the end of the queue, such that
it retains its state for as long as possible. You could even use a
generic formal constant (or some other mechanism) to control how large
the cache is.
Note that I have already implemented some of the validity checking
described in an earlier comment, and I was able to successfully detect a
dangling reference without doing anything special. The validity
checking would be even more robust were I to use GNAT's special
Debug_Pool storage pool.
ENDMJH.
(4) seemed to have too much overhead, but once we stopped trying to support
any
insertion or deletion into the container, the cost became quite reasonable.
All
the implementation of the check would need is a counter (8 bits probably is
enough) in each container. When an Iterate starts, the counter is
incremented;
when it completes, the counter is decremented. Each of the operations on the
list of problem operations check that the counter is zero, raising
Program_Error if the counter is nonzero.
MJH:
This won't work in the presence of multiple reader tasks, which a
container must support.
This API shouldn't be tied to a particular implementation technique
anyway.
ENDMJH.
(We don't have to worry about tasking issues, as the container object is
inside
of the Iterate call the entire time. If some other task makes a call during
that time, we have bad use of shared variables, and we don't care what
happens.
In fact, what will happen is that Program_Error would be raised, which is
probably a good thing.)
MJH:
We certainly do have to worry about tasking issues! It is certainly
*not* an error if multiple tasks all call Iterate simultaneously.
ENDMJH.
That has very little overhead, because virtually all of the operations in
question allocate or deallocate memory, and thus are expensive anyway, an
additional compare and branch will have no visible impact on performance.
(Sorting and Merging are also expensive; Swap_Links and Splice are the only
exceptions.) Operations that don't modify the container don't need to make
any
check.
MJH:
This technique doesn't work if there are multiple reader tasks.
ENDMJH.
This has the advantage of making passive iterators completely safe against
problems caused by what container operations are invoked in Process. (Yes,
calling Unchecked_Deallocation on the container could still cause problems,
but
that is covered by other rules of the language -- and even it would raise
Program_Error.) It also means that uses of passive iterators are safely
portable (whereas active iterators could have problems if a dangling cursor
was
used) -- which gives them a clear advantage.
MJH:
Well, if we're going to invoke "other rules of the language," then we
should just invoke the rule that says simultaneously reading from and
writing to a container is undefined, irrespective of whether this is one
task or multiple tasks.
ENDMJH.
This check is another one that could be dropped in an "unchecked" container.
MJH:
This check doesn't belong in this API. At a minimum the implementation
of the check described above doesn't work when there are multiple reader
tasks.
ENDMJH.
Thus, I've worded this check into all of the passive iterators.
The wording enumerates the reasons that a check is needed:
"if Process attempts to insert or delete elements into Container; or"
"modifies Container" would be too broad, as it could include replacing the
value of an element.
MJH:
I tend to think of the container and its elements as separate entities.
I often use the term "change the cardinality of the container" to
emphasize the modification of the container itself.
ENDMJH.
We need also to talk about finalization and about calling Move, as the
current
wording only talks about cursors being passed to operations, not something
that
happens *during* an operation. Moreover, once we decide to have a check,
including that check in the body of Finalize and Move is not difficult.
MJH:
I am against mandating any such check.
ENDMJH.
****************************************************************
MJH:
The following comments apply to Pascal's new set API:
AI-20302-07-addendum-set-crlf.txt
ENDMJH.
A.17.6 Sets
The language-defined packages Containers.Hashed_Sets and
Containers.Ordered_Sets
provide private types Set and Cursor, and a set of operations for each type.
A
hashed set container allow an arbitrary type to be stored in a set. An
ordered
set container orders its element per a specified relation.
MJH:
Well, technically both kinds of set (indeed, all kinds of containers)
allow an "arbitrary type" to be stored, not just the hashed set.
ENDMJH.
This section describes the declarations that are common to both kinds of
sets.
See A.17.7 for a description of the semantics specific to
Containers.Hashed_Sets
and A.17.8 for a description of the semantics specific to
Containers.Ordered_Sets.
The type Set is used to represent sets. The type Set needs finalization (see
7.6).
A set contains elements. Set cursors designate elements. There exists an
equivalence relation on elements, whose definition is different for hashed
sets
and ordered sets. A set never contains two or more equivalent elements. The
*length* of a set is the number of elements it contains.
Each nonempty set has two particular elements called the *first element* and
the
*last element* (which may be the same). Each element except for the last
element
has a *successor element*. If there are no other intervening operations,
starting with the first element and repeatedly going to the successor
element
will visit each element in the map exactly once until the last element is
reached. The exact definition of these terms is different for hashed sets
and
ordered sets.
MJH:
But do realize that only the ordered set has a Last_Element selector and
a Delete_Last modifier. I'm not sure any discussion of last element is
even relevant, since the successor of last is well-defined (the cursor
has the value No_Element).
ENDMJH.
Empty_Set represents the empty Set object. It has a length of 0. If an
object
of type Set is not otherwise initialized, it is initialized to the same
value as Empty_Set.
No_Element represents a cursor that designates no element. If an object of
type
Cursor is not otherwise initialized, it is initialized to the same
value as No_Element.
function "=" (Left, Right : Set) return Boolean;
If Left and Right denote the same set object, then the function returns
True. If
Left and Right have different lengths, then the function returns False.
Otherwise, for each element E in Left, the function returns False if an
element
equivalent to E is not present in Right. If the function has not returned a
result after checking all of the elements, it return True. Any exception
raised
during evaluation of element equivalence is propagated.
MJH:
As I have already mentioned, equality for sets is (er, should be)
defined in terms of element equality, not equivalence. This is true for
all containers, so sets shouldn't be any different.
ENDMJH.
...
procedure Replace (Container : in out Set;
New_Item : in Element_Type);
If Length (Container) equals 0, then Contraint_Error is propagated.
Otherwise,
Replace checks if an element equivalent to New_Item is already in the set.
If a
match is found, that element is replaced with New_Item; otherwise,
Constraint_Error is propagated.
MJH:
I'm not sure why that first sentence is necessary, since the last
sentence includes the case of Length(C) = 0.
ENDMJH.
...
procedure Iterate
(Container : in Set;
Process : not null access procedure (Position : in Cursor));
Iterate calls Process.all with a cursor that designates each element in
Container, starting with the first node and moving the cursor according to
the
successor relation. Any exception raised by Process.all is propagated.
Program_Error is propagated if:
* Process.all attempts to insert or delete elements from Container; or
* Process.all finalizes Container; or
* Process.all calls Move with Container as a parameter.
AARM Note:
This check takes place when the operations that insert or delete elements,
etc.
are called.
See Iterate for vectors for a suggested implementation of the check.
End AARM Notes.
MJH:
I don't know how you expect to implement such a requirement.
No, it's not good enough to modify the state of the container to
indicate that iteration is in progress, since the container must work in
the presence of multiple reader tasks. (Mixing readers and writers is
of course a no-no.)
Get rid of the requirement for a check, and say that modifying the
container (that is, changing its cardinality) during iteration is
erroneous (or unspecified, or whatever).
We already have a meta-rule that says behavior isn't specified if the
container is simultaneously queried and modified. This rule applies
even if the reader and writer are the same task (as would be the case of
deleting an element from the container while iteration is in progress).
The best way to handle this "problem" is to use assertion checks (or
perhaps some kind of preprocessor) that can be controlled by the user.
The assertions can use knowledge of the representation of the internal
storage node and the characteristics of the storage pool. For example,
here's a set of assertions that detect an attempt to delete a node that
has already been deleted:
pragma Assert (Tree.Length > 0);
pragma Assert (Tree.Root /= Null_Node);
pragma Assert (Tree.First /= Null_Node);
pragma Assert (Tree.Last /= Null_Node);
pragma Assert (Parent (Tree.Root) = Null_Node);
pragma Assert ((Tree.Length > 1)
or else (Tree.First = Tree.Last
and then Tree.First = Tree.Root));
pragma Assert ((Left (Node) = Null_Node)
or else (Parent (Left (Node)) = Node));
pragma Assert ((Right (Node) = Null_Node)
or else (Parent (Right (Node)) = Node));
pragma Assert (((Parent (Node) = Null_Node) and then (Tree.Root =
Node))
or else ((Parent (Node) /= Null_Node) and then
((Left (Parent (Node)) = Node)
or else (Right (Parent (Node)) =
Node))));
See Delete_Node_Sans_Free in a-crbtgo.adb. Similar checks immediately
following the call to Process by Iterate should be adequate to detect
most problems.
ENDMJH.
...
procedure Replace (Container : in out Set;
Key : in Key_Type;
New_Item : in Element_Type);
Equivalent to Replace (Container, Find (Container, Key), New_Item).
MJH:
This is a useless operation, and it should be removed from this API.
I'll include my analysis in my review of the post-Madison final draft.
[MJH -- See my earlier comments.]
Note also that the the cursor-based Replace operation by which this
key-based replace is implemented hasn't been mentioned anywhere above.
ENDMJH.
...
A.17.7 The Package Containers.Hashed_Sets
Static Semantics
The package Containers.Hashed_Sets has the following declaration:
generic
type Element_Type is private;
with function Hash (Element : Element_Type) return Hash_Type;
with function Equivalent_Elements (Left, Right : Element_Type)
return Boolean;
MJH:
We have already discussed the fact that you need to pass elem_t "=" too,
in order to implement set container "=".
We have also discussed the fact that I think the equivalence function
should be named Equivalent_Keys, not Equivalent_Elements.
ENDMJH.
...
function Equivalent_Elements (Left, Right : Cursor)
return Boolean;
function Equivalent_Elements (Left : Cursor;
Right : Element_Type)
return Boolean;
function Equivalent_Elements (Left : Element_Type;
Right : Cursor)
return Boolean;
MJH:
See my comment above. These should all be named Equivalent_Keys.
The model is that an element is either a composite type with a key-part
component; or, the element doesn't have a key-part, meaning that the
element is all-key. But in either case there is a key, even if the
"key" is the element itself.
(Another argument is that a set is basically the same as a map -- the
only difference is where the key lives. You should be able to switch
between a set and a map without too much pain, and having the set use
the name Equivalent_Element while the map uses Equivalent_Keys is a
gratuitous difference.)
ENDMJH.
...
generic
type Key_Type (<>) is limited private;
with function Key (Element : in Element_Type) return Key_Type;
with function Hash (Key : Key_Type) return Hash_Type;
with function Equivalent (Left : Key_Type; Right : Element_Type)
return Boolean;
MJH:
Now you have changed this too! It should be named Equivalent_Keys. The
whole point of this nested package is to allow key-based set
manipulation for (composite) element types that have a key-part.
ENDMJH.
package Generic_Keys is
...
procedure Replace (Container : in out Set;
Key : in Key_Type;
New_Item : in Element_Type);
MJH:
See my comments above. This operation should be removed from this API.
ENDMJH.
...
function Equivalent_Keys (Left : Cursor;
Right : Key_Type)
return Boolean;
function Equivalent_Keys (Left : Key_Type;
Right : Cursor)
return Boolean;
MJH:
Here's all the proof you need that the formal operation should be named
Equivalent_Keys, too.
ENDMJH.
end Generic_Keys;
private
... -- not specified by the language
end Ada.Containers.Hashed_Sets;
...
procedure Iterate
(Container : in Set;
Process : not null access procedure (Position : in Cursor));
In addition to the semantics described in A.17.6, Program_Error is
propagated if
Process.all calls Reserve_Capacity.
MJH:
See my earlier comments, about the fact that this requirement is
unnecessary, and unimplementable.
We already have a meta-rule that says the container cannot be read-from
and written-to simultaneously. The rule applies whether this is one
task or more than one task.
In this particular case, you can detect whether a node has been moved
(as a result of rehashing) like this:
for I in Container.Buckets'Length loop
...
Process (Cursor'(Container'UC, Node));
pragma Assert (Hash (Node.Element) mod Bucket'Len = I);
...
end loop;
ENDMJH.
...
function Equivalent_Keys (Left : Cursor;
Right : Key_Type) return Boolean;
Equivalent to Equivalent_Keys (Key (Left), Right).
function Equivalent_Keys (Left : Key_Type;
Right : Cursor) return Boolean;
Equivalent to Equivalent_Keys (Left, Key (Right)).
MJH:
This is inconsistent with the name for the generic formal function.
ENDMJH.
...
A.17.8 The Package Containers.Ordered_Sets
Static Semantics
The package Containers.Ordered_Sets has the following declaration:
generic
...
package Ada.Containers.Ordered_Sets is
...
generic
...
package Generic_Keys is
...
procedure Replace (Container : in out Set;
Key : in Key_Type;
New_Item : in Element_Type);
MJH:
I have already stated this Replace operation should be removed from this
API.
ENDMJH.
...
end Generic_Keys;
private
... -- not specified by the language
end Ada.Containers.Ordered_Sets;
****************************************************************
From: Randy Brukardt
Sent: Wednesday, October 27, 2004 10:58 PM
I'm going to only answer a few of these points, because my opinions are
already noted in the AI and the attached e-mail.
But several of the things Matt says are completely new and for the most
part, wrong, and they need to be addressed.
I'm going to answer this in several smaller messages so that we can make
reasonable threads.
...
> A.17 Containers
>
> ...
>
> Note that the language already includes several requirements that are
> important to the use of containers. First, library packages must be
> reentrant - multiple tasks can use the packages as long as they operate on
> separate containers. Thus, it is only necessary for a user to protect a
> container if a single container needs to be used by multiple tasks.
>
>
> MJH:
>
> We need to be clear here about multithreading issues, since that last
> sentence is wrong.
No, the last sentence exactly matches the language of A(3). And this
paragraph has been here forever, as has A(3).
> The only problem case is when there are multiple writers, or a single
> writer and one or more readers. (The reader and writer can also be the
> same task.)
>
> It is definitely *not* an error for multiple readers to access the same
> container all simultaneously.
Yes it is, because it violates A(3).
> In particular, it is perfectly acceptable (in fact, the API is designed
> to facilitate this) for multiple tasks to be iterating over a same
> container object, using either cursors or the passive iterator.
Maybe, but it violates A(3).
I would be very opposed to trying to repeal A(3) for these packages. There's
a number of reasons for that:
1) It would be different than all other Ada-defined packages. That would
add to user confusion.
2) It would prevent implementations from doing any sort of modifications
on reading. For instance, the common technique of making most recently
referenced elements to be more accessible in a hash table couldn't be used.
Nor could caches or reference counts. We had similar restrictions in some
operations in Claw, and it proved to be very constraining.
3) It would require a lot of wording to implement. We'd have to define
precisely which operations are reading and which are writing; that would be
complex to do.
4) It would make it far more likely for users to try to access containers
from multiple tasks and would lead to errors. For instance, your proposed
semantics wouldn't allow one task to write a container and another to read
it, but it's likely that users would try to do so.
The current rule is quite simple: if you want to use multiple tasks on a
single container (or any other entity of a predefined type), wrap it in a
protected object. Any other rule is going to be far more complex, both to
use and to understand.
We've always expected that a secondary standard would consider task-safe
"protected" containers. But the locking needed for that is quite expensive,
and it certainly shouldn't be mandated.
****************************************************************
From: Nick Roberts
Sent: Thursday, October 28, 2004 10:45 AM
> The current rule is quite simple: if you want to use multiple tasks on a
> single container (or any other entity of a predefined type), wrap it in a
> protected object. Any other rule is going to be far more complex, both to
> use and to understand.
I agree with Randy on this issue. In particular, calling protected
functions to read a wrapped container object should be something that
multiple tasks can do in parallel (provided no protected procedures are
in execution at the same time).
I can find no functions in the latest AI (v1.10) which would themselves
be likely to be implemented in such an impure manner that parallel calls
could interfere with each other or the integrity of the container's state,
but I suspect there is a possibility that the Find (plus Contains and
Has_Element) functions /might/ be implemented like this, as well as Floor
and Ceiling (they might modify the container's state in an effort to
increase the speed of subsequent searches).
How easy would it be to wrap a container in Ada 2005? I guess each
operation would have have to be explicitly wrapped (and in many cases
this might be quite right, too). Could this be done by renaming?
****************************************************************
From: Randy Brukardt
Sent: Wednesday, October 27, 2004 11:58 PM
Another response to Matt's review:
> procedure Iterate
> (Container : in List;
> Process : not null access procedure (Position : in Cursor));
>
> Invokes Process.all with a cursor that designates each node in Container.
> Any
> exceptions raised during Process are propagated.
> Program_Error is propagated if:
> * Proces