Version 1.3 of ai05s/ai05-0044-1.txt

Unformatted version of ai05s/ai05-0044-1.txt version 1.3
Other versions for file ai05s/ai05-0044-1.txt

!standard A.18(4/2)          07-11-29 AI05-0044-1/03
!standard A.18.2(231/2)
!standard A.18.3(145/2)
!standard A.18.6(56/2)
!standard A.18.8(66/2)
!standard A.18.9(79/2)
!standard A.18.16(5/2)
!standard A.18.16(9/2)
!class binding interpretation 07-04-02
!status ARG Approved 7-0-2 06-11-10
!status work item 07-04-02
!status received 07-01-24
!priority Medium
!difficulty Medium
!qualifier Omission
!subject Equivalence and equality in containers
!summary
The notion of "strict weak ordering" is defined. All "<" formal subprograms are required to have a strict weak ordering.
"=" formal functions for the Set containers are required to return False for non-equivalent elements.
!question
The equivalence relationship of ordered containers is assumed to be transitive. However, that does not follow from the requirements on "<" and the definition of equivalence. Consider the following function:
type T1 is (X, Y, Z);
function "<" (Left, Right : T1) return Boolean is -- This function implements the ordering: -- X < Z -- No other pairs are comparable. begin return Left = X and Right = Z; end "<";
This function meets the requirements on "<". But the equivalence relationship derived from it is not transitive. This could cause many bizarre effects in a container.
Similar issues arise for "<" as passed to various sorting routines. If "<" allows incomparable items, it's impossible to understand what the meaning of "smallest first" is.
A related issue is that the requirements on "=" for sets are not related to equivalence. That means that "=" on a set must take O(N**2), as checking only equivalent elements would be making an unwarranted assumption.
!recommendation
(See Summary.)
!wording
Define "strict weak ordering" after A.18(4/2):
When a formal operator "<" is used to provide an ordering for a container, it is generally required to define a strict weak ordering. An operator "<" defines a strict weak ordering if it is irreflexive, asymmetric, transitive, and in addition, if x < y for any values x and y, then for all other values z, (x < z) or (z < y).
Replace the second sentence of A.18.2(231/2) by It should define a strict weak ordering relationship (see A.18); it should not modify Container.
Make similar changes to A.18.3(145/2), A.18.6(56/2), A.18.9(79/2), A.18.16(5/2), and A.18.16(9/2).
Add "If the actual function for the generic formal function "=" returns True for any pair of non-equivalent elements, then the behavior of the container function "=" is unspecified." as a separate paragraph after A.18.8(66/2) and A.18.9(79/2).
!discussion
The existing wording requires a "<" actual function to be irreflexive, asymetric, and transitive. The formal definitions of these terms are: - irreflexive: X < X is False for all X. - asymmetric: if X < Y is True, then Y < X is False. - transitive: if X < Y is True and Y < Z is True then X < Z is True.
Note that the latter two say nothing about the case where X < Y is False. Thus it is OK to have both X < Y = False and Y < X = False and still be asymeteric. Indeed, this is the definition of equivalence, so it is clearly fine. What is not OK is for there to be a value which is equivalent to two other values which are not equivalent (that is, that have "<" returning True).
That is, we also need transitively of equivalence, or more precisely.
if x < y is True, then for all z, (x < z) or (z < y) is True.
!corrigendum A.18(4/2)
Insert after the paragraph:
If the advice suggests that the complexity should be less than O(f(N)), then for any arbitrarily small positive real D, there should exist a positive integer M such that for all N > M, t(N)/f(N) < D.
the new paragraph:
When a formal operator "<" is used to provide an ordering for a container, it is generally required to define a strict weak ordering. An operator "<" defines a strict weak ordering if it is irreflexive, asymmetric, transitive, and in addition, if x < y for any values x and y, then for all other values z, (x < z) or (z < y).
!corrigendum A.18.2(231/2)
Replace the paragraph:
The actual function for the generic formal function "<" of Generic_Sorting is expected to return the same value each time it is called with a particular pair of element values. It should define a strict ordering relationship, that is, be irreflexive, asymmetric, and transitive; it should not modify Container. If the actual for "<" behaves in some other manner, the behavior of the subprograms of Generic_Sorting are unspecified. How many times the subprograms of Generic_Sorting call "<" is unspecified.
by:
The actual function for the generic formal function "<" of Generic_Sorting is expected to return the same value each time it is called with a particular pair of element values. It should define a strict weak ordering relationship (see A.18); it should not modify Container. If the actual for "<" behaves in some other manner, the behavior of the subprograms of Generic_Sorting are unspecified. How many times the subprograms of Generic_Sorting call "<" is unspecified.
!corrigendum A.18.3(145/2)
Replace the paragraph:
The actual function for the generic formal function "<" of Generic_Sorting is expected to return the same value each time it is called with a particular pair of element values. It should define a strict ordering relationship, that is, be irreflexive, asymmetric, and transitive; it should not modify Container. If the actual for "<" behaves in some other manner, the behavior of the subprograms of Generic_Sorting are unspecified. How many times the subprograms of Generic_Sorting call "<" is unspecified.
by:
The actual function for the generic formal function "<" of Generic_Sorting is expected to return the same value each time it is called with a particular pair of element values. It should define a strict weak ordering relationship (see A.18); it should not modify Container. If the actual for "<" behaves in some other manner, the behavior of the subprograms of Generic_Sorting are unspecified. How many times the subprograms of Generic_Sorting call "<" is unspecified.
!corrigendum A.18.6(56/2)
Replace the paragraph:
The actual function for the generic formal function "<" on Key_Type values is expected to return the same value each time it is called with a particular pair of key values. It should define a strict ordering relationship, that is, be irreflexive, asymmetric, and transitive. If the actual for "<" behaves in some other manner, the behavior of this package is unspecified. Which subprograms of this package call "<" and how many times they call it, is unspecified.
by:
The actual function for the generic formal function "<" on Key_Type values is expected to return the same value each time it is called with a particular pair of key values. It should define a strict weak ordering relationship (see A.18). If the actual for "<" behaves in some other manner, the behavior of this package is unspecified. Which subprograms of this package call "<" and how many times they call it, is unspecified.
!corrigendum A.18.8(66/2)
Insert after the paragraph:
The actual function for the generic formal function Equivalent_Elements is expected to return the same value each time it is called with a particular pair of Element values. It should define an equivalence relationship, that is, be reflexive, symmetric, and transitive. If the actual for Equivalent_Elements behaves in some other manner, the behavior of this package is unspecified. Which subprograms of this package call Equivalent_Elements, and how many times they call it, is unspecified.
the new paragraph:
If the actual function for the generic formal function "=" returns True for any pair of non-equivalent elements, then the behavior of the container function "=" is unspecified.
!corrigendum A.18.9(79/2)
Replace the paragraph:
The actual function for the generic formal function "<" on Element_Type values is expected to return the same value each time it is called with a particular pair of key values. It should define a strict ordering relationship, that is, be irreflexive, asymmetric, and transitive. If the actual for "<" behaves in some other manner, the behavior of this package is unspecified. Which subprograms of this package call "<" and how many times they call it, is unspecified.
by:
The actual function for the generic formal function "<" on Element_Type values is expected to return the same value each time it is called with a particular pair of key values. It should define a strict weak ordering relationship (see A.18). If the actual for "<" behaves in some other manner, the behavior of this package is unspecified. Which subprograms of this package call "<" and how many times they call it, is unspecified.
If the actual function for the generic formal function "=" returns True for any pair of non-equivalent elements, then the behavior of the container function "=" is unspecified.
!corrigendum A.18.16(5/2)
Replace the paragraph:
The actual function for the generic formal function "<" of Generic_Array_Sort is expected to return the same value each time it is called with a particular pair of element values. It should define a strict ordering relationship, that is, be irreflexive, asymmetric, and transitive; it should not modify Container. If the actual for "<" behaves in some other manner, the behavior of the instance of Generic_Array_Sort is unspecified. How many times Generic_Array_Sort calls "<" is unspecified.
by:
The actual function for the generic formal function "<" of Generic_Array_Sort is expected to return the same value each time it is called with a particular pair of element values. It should define a strict weak ordering relationship (see A.18); it should not modify Container. If the actual for "<" behaves in some other manner, the behavior of the instance of Generic_Array_Sort is unspecified. How many times Generic_Array_Sort calls "<" is unspecified.
!corrigendum A.18.16(9/2)
Replace the paragraph:
The actual function for the generic formal function "<" of Generic_Constrained_Array_Sort is expected to return the same value each time it is called with a particular pair of element values. It should define a strict ordering relationship, that is, be irreflexive, asymmetric, and transitive; it should not modify Container. If the actual for "<" behaves in some other manner, the behavior of the instance of Generic_Constrained_Array_Sort is unspecified. How many times Generic_Constrained_Array_Sort calls "<" is unspecified.
by:
The actual function for the generic formal function "<" of Generic_Constrained_Array_Sort is expected to return the same value each time it is called with a particular pair of element values. It should define a strict weak ordering relationship (see A.18); it should not modify Container. If the actual for "<" behaves in some other manner, the behavior of the instance of Generic_Constrained_Array_Sort is unspecified. How many times Generic_Constrained_Array_Sort calls "<" is unspecified.
!ACATS test
This just defines more generic actuals to provide unspecified behavior, and unspecified behavior is not testable.
!appendix

From: Niklas Holsti
Date: Wednesday, January 24, 2007  8:11 AM

!topic Equivalent keys/elements in Ordered Maps and Sets
!reference Ada 2005 RMA.18.6(55/2)
!reference Ada 2005 RMA.18.9(78/2)
!from Niklas Holsti 07-01-24
!discussion

The generic container packages for Maps and Sets deal with a concept 
called "equivalent keys" or "equivalent elements". This equivalence is 
called an "equivalence relation" in A.18.4(5/2) for Maps and A.18.7(5/2) 
for Sets.

As I understand it, and as Wikipedia defines it, an equivalence relation 
is a reflexive, symmetric and transitive relation: any key or element X 
is equivalent to itself; if X is equivalent to Y then Y is equivalent to 
X; and if X and Y are equivalent and Y and Z are equivalent then X and Z 
are equivalent.
(Wikipedia: http://en.wikipedia.org/wiki/Equivalence_relation)

The concept of equivalent keys or elements is defined differently for 
Hashed and Ordered containers. The question discussed here arises only 
for Ordered containers. The question is whether the "equivalence" 
concept as defined for Ordered containers is certain to be transitive 
and thus whether this "equivalence" really is an equivalence relation. 
If it is not transitive then some container operations can give strange 
results.

For Ordered_Maps (or Ordered_Sets) the RM defines two keys (or elements) 
X and Y to be "equivalent" if they are incomparable, that is, if
X < Y is False and Y < X is False, using the actual comparison operator 
associated with the generic formal "<" operator at instantiation time. 
This is in A.18.6(55/2) and A.18.9(78/2), the paragraphs referenced in 
the headers above.

For clarity and brevity I will denote this "equivalence" by the "~" 
symbol. Thus the RM definition is:

    (X ~ Y)  if and only if (X < Y is False and Y < X is False)

The properties of the "~" relation thus depend on the properties of the 
actual comparison operator associated with the formal "<" operator. I 
will use the symbol "<" for this actual comparison operator too,  as 
there seems no risk of confusion between the two.

The properties required or expected of "<" are defined in A.18.6(56/2) 
for Ordered_Maps and in A.18.9(79/2) for Ordered_Sets. In both cases it 
is expected to be "a strict ordering relationship, that is, to be 
irreflexive, asymmetric, and transitive". I understand this as follows, 
using definitions from Wikipedia:

- irreflexive: X < X is False for all X.
   http://en.wikipedia.org/wiki/Irreflexive

- asymmetric: if X < Y is True, then Y < X is False.
   http://en.wikipedia.org/wiki/Asymmetric_relation
   (This is the "stronger" definition, the "weaker" being only
   "not symmetric". This form implies that "<" is irreflexive,
   making the requirements on "<" partly redundant.)

- transitive: if X < Y is True and Y < Z is True then X < Z is True.
   http://en.wikipedia.org/wiki/Transitive_relation

What do these properties of "<" imply for "~"?

- The "~" relation is obviously symmetric because "and" is
   commutative. This does not depend on any properties of "<".

- The "~" relation is reflexive because "<" is irreflexive.

- The "~" relation is *not* necessarily transitive.

The last claim is supported by this example:

    type T1 is (X, Y, Z);

    function "<" (Left, Right : T1) return Boolean is
    -- This function implements the ordering:
    --    X < Z
    -- No other pairs are comparable.
    begin
       return Left = X and Right = Z;
    end "<";

This function "<" is a strict ordering relation for T1. But the "~" 
relation induced by this "<" is not transitive: we have X ~ Y (because 
both X < Y and Y < X are False) and Y ~ Z (because both Y < Z and Z < Y 
are False) but not X ~ Z (because X < Z is True).

If the above reasoning is correct the conclusion is that the current RM 
requirements on the actual "<" operator for an instantiation of 
Ordered_Maps or Ordered_Keys are not strong enough to ensure that the 
RM-defined "equivalence" concept for keys or elements is an equivalence 
relation. This has three negative effects:

1. It confuses the reader to call something an "equivalence"
    when it isn't an equivalence relation (because it is not
    transitive).

2. As the effects of operations on Maps and Sets are defined in
    terms of this "equivalence", the operations may give
    strange (unexpected) results.

3. The definitions of some operations become ambiguous.

For an example of negative effect (2), strange results, consider the 
function Equivalent_Sets when Ordered_Sets is instantiated with the 
following element type and "<" operator:

   type T2 is (W, X, Y, Z);

    function "<" (Left, Right : T2) return Boolean is
    -- This function implements the ordering:
    --     W < X < Z
    --     W < Y
    -- No other pairs are comparable.
    begin
        case Left is
           when W => return Right /= W;
           when X => return Right = Z;
           when Y => return False;
           when Z => return False;
        end case;
    end "<";

The "equivalence" relation "~" derived from this "<" relation contains 
exactly the following pairs (giving True) and no others (all other pairs 
give False):

    X ~ X,  X ~ Y
    Y ~ Y,  Y ~ X,  Y ~ Z
    Z ~ Z,  Z ~ Y
    W ~ W.

Now consider two objects Left and Right of the type Set from this 
instance of Ordered_Sets. Let the Left set be {X, Z} and the Right set 
be {W, Y}. Note that X ~ Z is False, so the Left set is "legal" with 
respect to A.18.6(5/2), and likewise for the Right set. Let's apply the 
RM definition A.18.6(22/2) to see what Equivalent_Sets (Left, Right) 
should return:

- Left and Right are not the same set object
   so we do not return True at once.

- Left and Right have the same number of elements (2)
   so we do not return False at once.

- Now we must check each element of Left and look for an
   "equivalent" element in Right:
   - Check X from Left: we have X ~ Y and Y is in Right.
   - Check Z from Left: we have Z ~ Y and Y is in Right.
   Thus Equivalent_Sets (Left, Right) returns True.

However, Right contains the element W, which is not "equivalent" to any 
element in Left. I feel that this makes it strange to consider Left and 
Right to be "equivalent sets". Moreover, it means that Equivalent_Sets 
(Right, Left) will return False, showing that Equivalent_Sets is not 
symmetric in this case, another strange result.

These strange results are possible because two different elements of the 
Left set can be "equivalent" to the *same* element of the Right set, 
leaving one or more elements of the Right set unpaired with any 
"equivalent" elements of the Left set.

If the "~" relation is transitive, these strange results cannot happen, 
which is easily proved as follows. If there are two different elements 
L1 and L2 of Left and an element R of Right such that L1 ~ R and L2 ~ R 
then by the symmetry and transitivity of "~" we must have L1 ~ L2 which 
is impossible because no Set can contain two or more "equivalent" 
elements according to A.18.6(5/2).

For an example of negative effect (3), ambiguous definitions, consider 
the definition of the "=" operator on Ordered_Maps.Map in A.18.4(22/2) 
and (23/2). This definition implicitly assumes that each key in the Left 
map is "equivalent" to at most one key in the Right map. This assumption 
is false if "equivalence" of keys is not transitive; then there may be 
many "equivalent" keys in the Right map that could be matched against 
the same key in the Left map, making this definition of "=" ambiguous.

To avoid these problem I think that the RM paragraphs that specify the 
properties expected of the actual "<" operators should be extended to 
say that "equivalence of keys/elements" must be (is expected to be) 
transitive.

Thus, in A.18.6(56/2) the following sentence should be inserted after 
the sentence that ends with "transitive": "It should behave in such a 
manner that equivalence of keys is a transitive relation between keys."

In A18.9(79/2) the following sentence should be inserted after the 
sentence that ends with "transitive": "It should behave in such a manner 
that equivalence of elements is a transitive relation between elements."

This question does not arise for Hashed_Maps or Hashed_Sets because the 
equivalence of keys or elements is there defined directly by the formal 
functions Equivalent_Keys and Equivalent_Elements for which the RM 
specifically states that transitivity is expected, as well as 
reflexivity and symmetry.

****************************************************************

From: Tucker Taft
Date: Wednesday, January 24, 2007  1:51 PM

This seems like a valid point, though it seems unfortunate
to have to impose requirements on the "equivalence relation"
rather than directly on "<".  I'm surprised there isn't
a more directly stated property of "<" that would ensure that
equiv(X,Y) == "not X<Y and not Y<X" is transitive.

Perhaps something like requiring that "<" satisfies
the requirement that "X < Y and not Z < Y implies X < Z".

****************************************************************

From: Matthew Heaney
Date: Wednesday, January 24, 2007  2:44 PM

As I mentioned in my CLA posts, the requisite term is that "<" must 
define a "strict weak ordering", which implies that equivalence is 
transitive (it would be a "partial ordering" otherwise).

The SGI STL page has more info:

<http://www.sgi.com/tech/stl/StrictWeakOrdering.html>

****************************************************************

From: Adam Beneschan
Date: Wednesday, January 24, 2007  3:19 PM

> Perhaps something like requiring that "<" satisfies
> the requirement that "X < Y and not Z < Y implies X < Z".

Well, the definition of "transitive" for equivalence relationships is
that  

   (X equiv Y) and (Y equiv Z) implies (X equiv Z)

so, using the definition of "equivalence" from the RM, this becomes

   not (X < Y) and not (Y < X) and not (Y < Z) and not (Z < Y) implies
   not (X < Z) and not (Z < X)

which is ugly, but I don't know that it's possible to come up with a
shorter version just in terms of "<".

If you want to introduce a new term "comparable", defined as "A and B
are comparable if either A<B is true or B<A is true", then the above
is equivalent to:

   "If two elements X and Z are comparable, then every other element
    is comparable to either X or Z."

Which, since "comparable" just means the same as "not equivalent",
this is really just a fancy way of restating the definition of
transitivity for an equivalence relationship, then stating it as a
contrapositive.  But maybe it would be easier to understand and flow
better than talking about equivalence relationships.  The reason I
don't like it is that I'm used to thinking about the term "comparable"
in the context of a partial ordering, where two non-comparable
elements are just too different to be compared---not in the context of
a strict weak ordering where two non-comparable elements are
*equivalent*.  

Also, I should mention that I learned all this set theory stuff some
time ago, and I remember what an equivalence relation, equivalence
classes, partial orderings, and total orderings are, plus some other
related concepts; and I've never come across the term "strict weak
ordering" before this discussion.  So while my own personal experience
isn't necessarily an indicator of anything, I do think we should
consider how much the term is "common knowledge" before just dumping
it in the RM with no explanation.

****************************************************************

From: Tucker Taft
Date: Wednesday, January 24, 2007  3:52 PM

I would still prefer a requirement that does not
require the circuitous route of defining equivalence
and then requiring it to be transitive.  Did you
consider my proposal of requiring that:

    (X<Y and not Z<Y) implies X<Z

I wonder whether I could use that and the other requirements
to prove that the equivalence relation is transitive.  Let's see:

    The above is equivalent to:

       not (X<Y and not Z<Y) or X<Z

    which is equivalent to:

       not X<Y or Z<Y or X<Z

    which can be rearranged to be:

    (X < Z or Z < Y) or not X<Y

    which is equivalent to:

    (not X<Z and not Z<Y) implies not X<Y

We can swap X and Y in the above and produce:

    (not Y<Z and not Z<X) implies not Y<X

If we combine these two and regroup we get

     (not X<Z and not Z<X) and (not Z<Y and not Y<Z)
         implies (not X<Y and not Y<X)

which is our desired transitivity rule.  QED!

Now I suppose this might be overspecifying, but the
rule I have suggested seems something that any "reasonable"
"<" operator ought to satisfy.  In words it might
be phrased that if X and Y are ordered with respect
to one another, then Z must be ordered with at least
one of them.  A reexpression of the requirement that
more nearly matches these words would be:

    For any X,Y,Z in domain of "<": X < Y implies (X < Z or Z < Y)

Comments?

****************************************************************

From: Adam Beneschan
Date: Wednesday, January 24, 2007  4:46 PM

...
> If we combine these two and regroup we get
> 
>      (not X<Z and not Z<X) and (not Z<Y and not Y<Z)
>          implies (not X<Y and not Y<X)
> 
> which is our desired transitivity rule.  QED!
> 
> Now I suppose this might be overspecifying, 

Actually, I don't think so---it appears that the rule you formed is
equivalent to the transitivity rule.

I wrote a small program (in Ada, of course) that assumed there were
three possible elements, A, B, and C, and tested all 64 combinations
of A<B, A<C, B<A, etc., eliminating those that didn't meet the
asymmetry and transitivity requirements; that left 19 combinations.
In all those cases, the statements

  (X equiv Y) and (Y equiv Z) implies (X equiv Z), for all X, Y, Z

and 

  (X < Y) and (not Z < Y) implies (X < Z), for all X, Y, Z

produced the same Boolean result.  (It's nice to be able to work with
Boolean variables, because then the number of cases is small enough
that I can just try them all instead of having to mess around with
Boolean algebra and risk making a mistake.)


> but the
> rule I have suggested seems something that any "reasonable"
> "<" operator ought to satisfy.  In words it might
> be phrased that if X and Y are ordered with respect
> to one another, then Z must be ordered with at least
> one of them.  

Sounds vaguely familiar :-)

> A reexpression of the requirement that
> more nearly matches these words would be:
> 
>     For any X,Y,Z in domain of "<": X < Y implies (X < Z or Z < Y)

Yep, I added that to my program, and it's equivalent to the other two.
I kind of like this last formulation since it's more symmetrical than
the first one, which IMHO would make it easier to understand and
apply. 

****************************************************************

From: Randy Brukardt
Date: Wednesday, January 24, 2007  5:15 PM

> Also, I should mention that I learned all this set theory stuff some
> time ago, and I remember what an equivalence relation, equivalence
> classes, partial orderings, and total orderings are, plus some other
> related concepts; and I've never come across the term "strict weak
> ordering" before this discussion.  So while my own personal experience
> isn't necessarily an indicator of anything, I do think we should
> consider how much the term is "common knowledge" before just dumping
> it in the RM with no explanation.

On c.l.a., somone claimed that the term is not commonly used.

In any case, our definition of mathematical terms is given in by the
document defined in 1.3 of the Standard. I tried the online version of it
(www.mathworld.com), and did not find any definition of "strict weak
ordering". That means (presuming I didn't screw up in some way) that we
can't use that term in the RM, as it is not defined by our reference. (Or
we'd have to define it ourselves.)

One of Tucker's fixes looks good enough for this, so I don't think we need
to bother with a definition of "strict weak ordering". (Aside: it would sure
be nice if we could specify such things in the generic contract, rather than
a mess of words. But I don't have any good idea as how to do that.)

****************************************************************

From: Simon Wright
Date: Wednesday, January 24, 2007  5:40 PM

> The SGI STL page has more info:
>
> <http://www.sgi.com/tech/stl/StrictWeakOrdering.html>

Also at http://en.wikipedia.org/wiki/Strict_weak_ordering -- the  
Properties section is interesting, very close to Tucker's  
suggestions. I think that the 4th bullet is the point at which  
Nicholas's bizarre "<" fails to meet the test.

****************************************************************

From: Pascal Leroy
Date: Thursday, January 25, 2007  3:26 AM

Niklas said:

> The question discussed here arises only 
> for Ordered containers. The question is whether the "equivalence" 
> concept as defined for Ordered containers is certain to be transitive 
> and thus whether this "equivalence" really is an equivalence 
> relation. 

Interesting.  The requirements on the formal operators "=" and "<" were
nailed down fairly late in the game, and I suppose that we didn't do our
homework properly.  In a private email date June 1st, 2005, I wrote "we
could say that the relation "not (K1 < K2) and not (K2 < K1)" has to be an
equivalence" but apparently we didn't opt for that formulation (I suppose
that we thought that "strict ordering" was sufficient).

For an implementation based on a binary tree to work the "equivalence"
really has to be an equivalence relation, otherwise the result that you
get depends on the order of insertions (because it depends on the
representative that you choose to store in a node).  

Tuck replied:

> A reexpression of the requirement that
> more nearly matches these words would be:
> 
>     For any X,Y,Z in domain of "<": X < Y implies (X < Z or Z < Y)

As Simon noticed this exactly matches the notion of "transitivity of
equivalence" as defined by the Wikipedia article on strict weak ordering,
so it does look like the way to go.

Randy commented:

> In any case, our definition of mathematical terms is given in 
> by the document defined in 1.3 of the Standard. I tried the 
> online version of it (www.mathworld.com), and did not find 
> any definition of "strict weak ordering". That means 
> (presuming I didn't screw up in some way) that we can't use 
> that term in the RM, as it is not defined by our reference. 
> (Or we'd have to define it ourselves.)

Well, in practice we define the term "strict ordering" because we write
"it should define a strict ordering relationship, that is, be irreflexive,
asymmetric, and transitive".  But we rely on our reference document to
define "irreflexive", "asymmetric", and "transitive".

In A.18(3-4) we give a definition of the big-O notation.  It would be
sensible to append there a definition of "strict weak ordering" and use
that term whenever we describe the requirements on "<", with a
cross-reference "see A.18".

****************************************************************

From: Christoph Grein
Date: Thursday, January 25, 2007  6:47 AM

I'm not sure where to put the following definition, but I think it should be included
in the RM.

A <strict weak ordering> has the following properties. For all x, y and z:

For all x, it is not the case that x < x (irreflexivity). 
For all x /= y, if x < y then it is not the case that y < x (antisymmetry). 
For all x, y, and z, if x < y and y < z then x < z (transitivity). 
For all x, y, and z, if x is incomparable with y, and y is incomparable with z,
then x is incomparable with z (transitivity of equivalence).

****************************************************************

From: Niklas Holsti
Date: Monday, January 29, 2007  3:40 PM

>>A reexpression of the requirement that
>>more nearly matches these words would be:
>>
>>    For any X,Y,Z in domain of "<": X < Y implies (X < Z or Z < Y)
> 
> 
> As Simon noticed this exactly matches the notion of "transitivity of
> equivalence" as defined by the Wikipedia article on strict weak ordering,
> so it does look like the way to go.

Expanding the area of discussion a bit, should this requirement for 
"weak" ordering, or transivity of equivalence, also apply to the generic 
formal "<" operators of Ada.Containers.Vectors.Generic_Sorting, 
Ada.Containers.Doubly_Linked_Lists.Generic_Sorting and the generic array 
sorting procedures in A.18.16?

I think it should. I tried to understand what the RM means by "sorted 
smallest first" in these places, but my attempts at writing this in 
terms of "<" between adjacent elements suggest that incomparability 
should be transitive here, too. Moreover, googling "Quicksort strict 
weak" locates several pages that require "strict weak ordering" for 
Quicksort.

Whatever the answer, I suggest that the RM should define what is meant 
by "sorted smallest first". It is not obvious, if the definition can use 
only the "<" operator and the container can contain incomparable elements.

****************************************************************

From: Tucker Taft
Date: Monday, January 29, 2007  5:36 PM

Yes, I think you are right that the additional
requirement on "<" should apply in these additional
places.  Presumably the definition of being sorted
smallest first is that for the final "Sorted" array:

  for I,J indices into Sorted:

       I < J implies not(Sorted(J) < Sorted(I))

****************************************************************

From: Niklas Holsti
Date: Tuesday, January 30, 2007  2:54 AM

> Yes, I think you are right that the additional
> requirement on "<" should apply in these additional
> places.

Good.

> Presumably the definition of being sorted
> smallest first is that for the final "Sorted" array:
> 
>  for I,J indices into Sorted:
> 
>       I < J implies not(Sorted(J) < Sorted(I))

Yes, that is one possible formulation; it describes the Sorted array as 
"globally non-decreasing".

An equivalent one, which I prefer because it has no negation and because 
it is closer to the words "smallest first" is

    for I,J indices into Sorted:

       Sorted(I) < Sorted(J) implies I < J

If "<"-equivalence (incomparability) is transitive, it is enough to 
compare adjacent elements and check that Sorted(I+1) < Sorted(I) is 
False for all indices I in First .. Last-1. This form describes the 
Sorted array as "locally non-decreasing". Using this form in the 
function Is_Sorted gives O(N) complexity, as one would expect for that 
function.

****************************************************************

From: Tucker Taft
Date: Tuesday, January 30, 2007  1:36 PM

> An equivalent one, which I prefer because it has no negation and because 
> it is closer to the words "smallest first" is
> 
>    for I,J indices into Sorted:
> 
>       Sorted(I) < Sorted(J) implies I < J

That does seem more natural, and would seem to be a worthwhile
addition to the manual as a definition of "smallest first."

****************************************************************

From: Niklas Holsti
Date: Wednesday, January 24, 2007  2:00 PM

!topic Formal "=" for Element_Type in Set containers
!reference Ada 2005 RMA.18.4(3/2)
!reference Ada 2005 RMA.18.7(3/2)
!from Niklas Holsti 07-01-24
!discussion

All generic packages in Ada.Containers for Maps and Sets take a generic 
formal "=" operator for the Element_Type. The actual "=" operator for 
Element_Type, supplied for the instantiation, is expected to be 
deterministic, reflexive and symmetric. This is stated in A.18.4(3/2) 
for Maps and in A.18.7(3/2) for Sets.

The Map packages have no other generic formal functions taking 
Element_Type values, thus the question to be discussed here does not 
arise for Maps.

The Set packages have other generic formal functions taking Element_Type 
parameters: Equivalent_Elements and Hash (for Hashed_Sets) and "<" (for 
Ordered_Sets). The question I want to raise here is the puzzling fact 
that the RM apparently requires and expects no relationship between the 
"=" operator on Element_Type and the functions Equivalent_Elements, Hash 
and "<".

For example, in the case of two elements E1 and E2 in an Ordered_Set it 
would be very natural to require that if E1 "=" E2 then we cannot also 
have E1 "<" E2 or E2 "<" E1 (in other words E1 should be "equivalent" to 
E2, using the definition of equivalent elements given in A.18.9(78/2)). 
But I can find no RM text in A.18 that explicitly says this.

Apart from the RM paragraphs referenced above, I found only the 
following mentions of the "=" operator on Element_Type for Sets:

- In A.18.7(20/2), which says that "=" on Set objects uses the
   formal "=" on Element_Type. (The Annotated RM says that this
   is the only place where "=" for Element_Type should be used.)

- In A.18.9(80/2), which says that elements should not be changed
   so that the "=" operator gives a different result (except by
   operations in the Sets package itself).

Neither of these paragraphs specifies any relationship between the "=" 
operator on Element_Type and the other formal functions with 
Element_Type parameters.

The "=" operator on Sets is defined in A.18.7(20/2). This definition 
requires each element of the Left set to be compared for "=" with the 
elements of the Right set, using the formal "=" operator on 
Element_Type. If the RM really puts no constraints on the behaviour of 
"=" on Element_Type with respect to the functions Equivalent_Elements, 
Hash and "<" on Element_Type, two problems arise with this definition of 
"=" on Sets:

1. I don't see how the "=" operator on Sets could be implemented in less 
than O(N**2) time, because the implementation could not use the Hashed 
or Ordered structure to limit the number of Element_Type pairs from Left 
and Right that must be compared with "=".

2. The "=" operator on Sets could return True for two sets Left and 
Right although the Right set contains elements that are not "=" to any 
elements of the Left set. This is because the definition is asymmetric 
(it "traverses" Left and "searches" Right), so two different elements of 
Left could be "=" to the same element of Right, leaving some elements of 
Right unmatched with elements of Left.

I think (and discussion on comp.lang.ada supports it) that the 
definition of "=" on Sets expects the implementation to compare only 
"equivalent" elements of the Left and Right Sets. If this "equivalence" 
is transitive (see my earlier comment on the topic "Equivalent 
keys/elements in Ordered Sets and Maps") this means that it is enough to 
compare each element of Left with "=" to exactly one element of Right 
(or, for Hashed_Sets, to the elements in one bucket of Right). This 
makes the "=" operator on Sets take O(N) time, as desired.

As a parenthesis, from discussion on comp.lang.ada it occurs to me that 
the RM authors may have intended to specify relationships between the 
"=" operator on Element_Type and the other formal functions with 
Element_Type parameters *indirectly*, through the properties expected of 
these other formal functions, as follows:

For Hashed_Sets A.18.8(66/2) says that the actual Equivalent_Sets 
function is expected to be reflexive. According to Wikipedia, a 
(mathematical) relation "~" is reflexive if E ~ E for all E, thus on the 
Ada level I understand this RM paragraph to expect only that 
Equivalent_Sets (E, E) is True for all elements E. Obviously this does 
not involve the "=" operator on Element_Type.

However, a *mathematically* equivalent definition of a reflexive 
relation "~" is that E1 = E2 implies E1 ~ E2, for all E1 and E2. Perhaps 
RM A.18.8(66/2) intends a similar formulation on the Ada level, that is: 
for any two elements E1 and E2, if E1 = E2 (using the formal "=" 
operator) then Equivalent_Elements (E1, E2) is expected to be True. This 
would define a natural relationship between the actual "=" for 
Element_Type and the actual function for Equivalent_Elements.

The two mathematical definitions of "reflexive relation" are logically 
equivalent because mathematics is referentially transparent (I hope I am 
using this term more or less correctly): if E1 = E2 then E1 can be 
substituted for E2 in any expression that uses E2 (as a free variable) 
with no change in the value of the expression.

The two Ada-level definitions of the reflexivity of Equivalent_Elements 
are not logically equivalent because Ada expressions are not 
referentially transparent, neither with respect to the predefined "=" 
operators (for example, E1 = E2 does not imply E1'Address = E2'Address) 
nor with respect to user-defined "=" operators.

To continue in the same way, for Ordered_Sets A.18.9(79/2) says that the 
formal "<" function on Element_Type is expected to be irreflexive. 
Following the normal mathematical definition this means on the Ada-level 
that E < E is False for all E. As for Hashed_Sets a mathematically 
equivalent reformulation can translate this into the Ada-level 
expectation that E1 < E2 implies that E1 = E2 is False, using the formal 
"<" and "=" operators. Again this would define a natural relationship 
between the actual "=" and "<" operators for Element_Type.

The RM authors may not have intended these alternative Ada-level 
formulations of "reflexive" and "irreflexive" properties, so I will 
close this parenthesis here and proceed to my recommendation.

To enforce or clarify the expected relationship between "=" on 
Element_Type and the other formal functions taking Element_Type 
parameters, I suggest to add a sentence in A.18.7 saying that the actual 
function for the generic function "=" on Element_Type is expected to 
return True only when its Left and Right parameters are equivalent 
elements in the equivalence relation introduced in paragraph 
A.18.7(5/2). In other words, if E1 = E2 is True then E1 and E2 are 
expected to be equivalent.

For Hashed_Sets this means that if E1 = E2 is True then so is 
Equivalent_Elements (E1, E2), which (by existing rules) means that E1 
and E2 have the same Hash value.

For Ordered_Sets this means that if E1 = E2 is True then E1 < E2 is 
False and E2 < E1 is also False.

It may be better to reorder the paragraphs in this area so that the new 
sentence can be put in the same paragraph with the sentences in the 
current paragraph (3/2) that define the other expected properties of the 
"=" function. The equivalence relation should be introduced first, 
followed by the expected properties of "=".

****************************************************************

From: Pascal Leroy
Date: Thursday, January 25, 2007  4:27 AM

> Neither of these paragraphs specifies any relationship 
> between the "=" operator on Element_Type and the other formal functions
with 
> Element_Type parameters.

At the time we realized that we needed to be very precise in specifying
the behavior of the formal functions that define the equivalence (Hash and
"<") because they are essential to the integrity of the container.

On the other hand, we thought that, since the formal "=" operator is just
used for "=" of sets and maps, it does not play any role in the integrity
of the container, and that you get what you get.  The requirements that we
imposed on "=" only reflect the fact that we didn't want to constrain how
many times the implementation calls it and with what parameters.

> 1. I don't see how the "=" operator on Sets could be 
> implemented in less 
> than O(N**2) time

I agree that this is not acceptable.

> 2. The "=" operator on Sets could return True for two sets Left and 
> Right although the Right set contains elements that are not 
> "=" to any 
> elements of the Left set. This is because the definition is 
> asymmetric 

I have never been in love with this asymmetric definition.  I suppose that
we could fix the definition, but that would leave us with problem (1)
above.  At any rate, the definition makes it clear that the intent was
"scan one set, do a lookup in the other, compare the elements".

> the RM authors may have intended to specify relationships between the 
> "=" operator on Element_Type and the other formal functions with 
> Element_Type parameters *indirectly*

No, that was not the intent, we missed problem (1) above, but we were not
confused that the formal "=" operator is very different from mathematical
equality.

> I suggest to add a sentence in A.18.7 saying that the actual 
> function for the generic function "=" on Element_Type is expected to 
> return True only when its Left and Right parameters are equivalent 
> elements in the equivalence

I agree that this is the best fix.

Note however that "=" on sets is fraught with difficulties because its
result will depend on the representative of each equivalence class that is
actually stored in the set.  If you ever insert two equivalent elements in
a given set, you are treading on thin ice.  Two sets that only differ
because they were created in different orders (but with the same elements)
are unlikely to compare equal.  Heck, if an implementation internally
introduces some amount of randomization (perhaps due to storage
management) two sets created in the same order might even compare unequal.
So this strange function can be trusted when it return True, not when it
returns False.  Hmm.

****************************************************************

From: Matthew Heaney
Date: Thursday, January 25, 2007 12:21 PM

>> 1. I don't see how the "=" operator on Sets could be 
>> implemented in less 
>> than O(N**2) time
> 
> I agree that this is not acceptable.

O(n) is the worst that "=" could ever do.  In practice you'll do better 
since when equality for elements compares false you terminate the 
iteration immediately.

> Note however that "=" on sets is fraught with difficulties because its
> result will depend on the representative of each equivalence class that is
> actually stored in the set.  If you ever insert two equivalent elements in
> a given set, you are treading on thin ice.  

But the second insertion will be rejected (since an equivalent element 
is already in the set), so there is no issue.

****************************************************************

From: Matthew Heaney
Date: Thursday, January 25, 2007  1:18 PM


> The Set packages have other generic formal functions taking Element_Type 
> parameters: Equivalent_Elements and Hash (for Hashed_Sets) and "<" (for 
> Ordered_Sets). The question I want to raise here is the puzzling fact 
> that the RM apparently requires and expects no relationship between the 
> "=" operator on Element_Type and the functions Equivalent_Elements, Hash 
> and "<".

No, this is incorrect.  There certainly *is* an expectation that 
equality implies equivalence (even if this is not stated explicitly).

> For example, in the case of two elements E1 and E2 in an Ordered_Set it 
> would be very natural to require that if E1 "=" E2 then we cannot also 
> have E1 "<" E2 or E2 "<" E1 (in other words E1 should be "equivalent" to 
> E2, using the definition of equivalent elements given in A.18.9(78/2)). 
> But I can find no RM text in A.18 that explicitly says this.

Equality for sets works by iterating over the left set, and searching 
for each element in the right set.  If no equivalent element is found 
then the operation terminates immediately with False as the result.  If 
the element is found (meaning that the elements are equivalent), then 
the elements are compared for equality.  If element equality returns 
False, then the operation terminates immediately with the result False. 
  Otherwise it continues the iteration until you run out of elements.

Equality for sets certainly does NOT work by performing a linear search 
for the element.  That would be a truly useless operation.

(For the implementation of an ordered set, no search is necessary since 
you can simply have a pair of cursors to walk both internal trees 
simultaneously.)

> Apart from the RM paragraphs referenced above, I found only the 
> following mentions of the "=" operator on Element_Type for Sets:
...
> Neither of these paragraphs specifies any relationship between the "=" 
> operator on Element_Type and the other formal functions with 
> Element_Type parameters.

Equality should imply equivalence.  If you define equality differently 
then you might not get the result you want.

> 1. I don't see how the "=" operator on Sets could be implemented in less 
> than O(N**2) time, because the implementation could not use the Hashed 
> or Ordered structure to limit the number of Element_Type pairs from Left 
> and Right that must be compared with "=".

No, this is all wrong.  A non-linear search is first performed (meaning 
that the elements are compared for equivalence using the canonical 
mechanisms), and then a test for equality is performed.

You're probably confused because RM05 A.18.7 (20/2) says:

"Otherwise, for each element E in Left, the function returns False if an 
element equal to E (using the generic formal equality operator) is not 
present in Right."

But this does NOT mean that there's a linear search for an equal item. 
It DOES mean that there's a non-linear search for an equivalent item, 
followed by a test for equality.

> 2. The "=" operator on Sets could return True for two sets Left and 
> Right although the Right set contains elements that are not "=" to any 
> elements of the Left set. This is because the definition is asymmetric 
> (it "traverses" Left and "searches" Right), so two different elements of 
> Left could be "=" to the same element of Right, leaving some elements of 
> Right unmatched with elements of Left.

You're confused because you seem to think that element equality is used 
to implement the search.  But that's wrong: element equality is only 
used *after* the search (which finds an equivalent item).

> However, a *mathematically* equivalent definition of a reflexive 
> relation "~" is that E1 = E2 implies E1 ~ E2, for all E1 and E2. Perhaps 
> RM A.18.8(66/2) intends a similar formulation on the Ada level, that is: 
> for any two elements E1 and E2, if E1 = E2 (using the formal "=" 
> operator) then Equivalent_Elements (E1, E2) is expected to be True. This 
> would define a natural relationship between the actual "=" for 
> Element_Type and the actual function for Equivalent_Elements.

Yes, that's correct.  If element equality returns True, then 
Equivalent_Elements should return True.

Of course, if you define element equality differently, then equality on 
sets might not return the value you want.

If you want set equality to imply set equivalence, then you have to 
define the generic actual for element equality such that it only returns 
True when the elements are equivalent.

> To enforce or clarify the expected relationship between "=" on 
> Element_Type and the other formal functions taking Element_Type 
> parameters, I suggest to add a sentence in A.18.7 saying that the actual 
> function for the generic function "=" on Element_Type is expected to 
> return True only when its Left and Right parameters are equivalent 
> elements in the equivalence relation introduced in paragraph 
> A.18.7(5/2). In other words, if E1 = E2 is True then E1 and E2 are 
> expected to be equivalent.

Why is this necessary?  If element equality does not imply equivalence 
then set equality will not imply set equivalence.  What's the problem?

****************************************************************

From: Tucker Taft
Date: Thursday, January 25, 2007  5:46 PM

> Equality for sets works by iterating over the left set, and searching 
> for each element in the right set.  If no equivalent element is found 
> then the operation terminates immediately with False as the result. 

That isn't what the RM says in A.18.7(20).  It makes no mention of
equivalence.  If we want the above to be a legitimate
implementation, then we need to include a mention of
equivalence in A.18.7(20) or add a requirement that
"=" on elements implies equivalence or the result of "="
on sets is unspecified.

> Equality should imply equivalence.  If you define equality differently 
> then you might not get the result you want.

But the RM has to say that somewhere.

>> ...
>> To enforce or clarify the expected relationship between "=" on 
>> Element_Type and the other formal functions taking Element_Type 
>> parameters, I suggest to add a sentence in A.18.7 saying that the 
>> actual function for the generic function "=" on Element_Type is 
>> expected to return True only when its Left and Right parameters are 
>> equivalent elements in the equivalence relation introduced in 
>> paragraph A.18.7(5/2). In other words, if E1 = E2 is True then E1 and 
>> E2 are expected to be equivalent.
> 
> Why is this necessary?  If element equality does not imply equivalence 
> then set equality will not imply set equivalence.  What's the problem?

The problem is that the definition of "=" on sets does not
mention equivalence, nor is there any mention of the relationship
between "=" on elements and equivalence, so you as an implementor
can't use equivalence to implement set "=".  If you want to use
equivalence, then the RM needs to be augmented with words that
authorize that implementation.

So I agree with the original comment by Niklas, that we need to
add something to the RM wording in this section.

****************************************************************

From: Matthew Heaney
Date: Thursday, January 25, 2007  9:13 PM

> > Equality for sets works by iterating over the left set, and searching 
> > for each element in the right set.  If no equivalent element is found 
> > then the operation terminates immediately with False as the result.
> 
> That isn't what the RM says in A.18.7(20).  It makes no 
> mention of equivalence.  If we want the above to be a 
> legitimate implementation, then we need to include a mention 
> of equivalence in A.18.7(20) or add a requirement that "=" on 
> elements implies equivalence or the result of "="
> on sets is unspecified.

OK, I'm certainly in favor of the RM requiring (or otherwise mentioning)
that element equality must imply element equivalence.  The explicit
requirement would allow me to put some assertions in GNAT to verify the
implication.  (In GNAT if you compile a container instantiation with
assertions enabled, you get a bunch of extra checks.  That's how I detect
dangling cursors, for example.)

> > Equality should imply equivalence.  If you define equality differently 
> > then you might not get the result you want.
> 
> But the RM has to say that somewhere.

OK.
 
> > Why is this necessary?  If element equality does not imply equivalence 
> > then set equality will not imply set equivalence.  What's the problem?
> 
> The problem is that the definition of "=" on sets does not 
> mention equivalence, nor is there any mention of the 
> relationship between "=" on elements and equivalence, so you 
> as an implementor can't use equivalence to implement set "=". 

OK.

****************************************************************

From: Pascal Leroy
Date: Friday, January 26, 2007  1:17 AM

> >> 1. I don't see how the "=" operator on Sets could be
> >> implemented in less 
> >> than O(N**2) time
> > 
> > I agree that this is not acceptable.
> 
> O(n) is the worst that "=" could ever do.

Duh?  For an ordered set I don't see how you can do better than
O(n*Log(n)) since the search is going to require O(Log(n)) comparisons
anyway.

> In practice you'll do better 
> since when equality for elements compares false you terminate the 
> iteration immediately.

Matt, this is irrelevant to the asymptotic complexity.  It doesn't help
that in some cases you'll stop after the first comparison.  If the sets
compare equal surely you'll have to check all the elements.  And if they
don't, on average you'll have to check n/2 elements.

> > Note however that "=" on sets is fraught with difficulties because its 
> > result will depend on the representative of each equivalence class 
> > that is actually stored in the set.  If you ever insert two equivalent 
> > elements in a given set, you are treading on thin ice.
> 
> But the second insertion will be rejected (since an equivalent element 
> is already in the set), so there is no issue.

Not if you call Include or the 4-parameter Insert to add the element to
the set.

The scenario is this: you have a collection of elements, and you add them
to two sets, perhaps using Include, in different order.  Chances are that
the resulting sets will not compare equal, even though they come from the
same elements.  (They will be equivalent, though.)

Because of this counter-intuitive semantics, I believe that "=" on sets is
going to lead to bugs in user code.  Most people won't realize that they
want to use Equivalent_Sets, not "=", to get a reliable answer.  The "="
operator as defined is junk and should not have been included in the
package.  Sigh.

****************************************************************

From: Matthew Heaney
Date: Friday, January 26, 2007  7:57 AM

> Duh?  For an ordered set I don't see how you can do better than
> O(n*Log(n)) since the search is going to require O(Log(n)) 
> comparisons anyway.

Yes, you're right.  I was wearing my implementation hat, since for an
ordered set you can simply do an pairwise comparison; no O(log(n)) search is
necessary.
 
> > In practice you'll do better
> > since when equality for elements compares false you terminate the 
> > iteration immediately.
> 
> Matt, this is irrelevant to the asymptotic complexity.  It 
> doesn't help that in some cases you'll stop after the first 
> comparison.  If the sets compare equal surely you'll have to 
> check all the elements.  And if they don't, on average you'll 
> have to check n/2 elements.

Right.

> > But the second insertion will be rejected (since an equivalent element 
> > is already in the set), so there is no issue.
> 
> Not if you call Include or the 4-parameter Insert to add the 
> element to the set.

I still don't see this.  The 4-param Insert doesn't change anything, it
simply reports whether the insertion succeeds using a Boolean instead of an
exception.

> The scenario is this: you have a collection of elements, and 
> you add them to two sets, perhaps using Include, in different 
> order.  Chances are that the resulting sets will not compare 
> equal, even though they come from the same elements.  (They 
> will be equivalent, though.)

But wouldn't this only be true for a multi-set (and only if you don't
specify where the new item goes relative to existing equivalent items)?  In
our case (a unique-element set), if an equivalent item is already in the
set, then the insert will fail.  (Unless you're talking about Include, but
then the new item would replace what's there, so the same element will still
be in both sets.)

> Because of this counter-intuitive semantics, I believe that 
> "=" on sets is going to lead to bugs in user code.  Most 
> people won't realize that they want to use Equivalent_Sets, 
> not "=", to get a reliable answer.  The "="
> operator as defined is junk and should not have been included 
> in the package.  Sigh.

I still don't understand this.  Nor do I understand how you could get rid of
set equality if the set type is non-limited.

****************************************************************

From: Pascal Leroy
Date: Friday, January 26, 2007  8:31 AM

> I still don't see this.  The 4-param Insert doesn't change 
> anything, it simply reports whether the insertion succeeds 
> using a Boolean instead of an exception.
> 
> But wouldn't this only be true for a multi-set (and only if 
> you don't specify where the new item goes relative to 
> existing equivalent items)?  In our case (a unique-element 
> set), if an equivalent item is already in the set, then the 
> insert will fail.  (Unless you're talking about Include, but 
> then the new item would replace what's there, so the same 
> element will still be in both sets.)

An example might help: consider a set of strings, and assume that the "<"
is case-insensitive lexicographic comparison (so equivalence is
case-insensitive equality).  Equality of strings is, of course,
case-sensitive.

Now consider the insertions:

	Include (S1, "Ada");
	Include (S1, "ADA");
	Include (S2, "ADA");
	Include (S2, "Ada");

After these calls you have S1={"ADA"} and S2={"Ada"} so S1 and S2 are
equivalent (fine) but not equal.  Surely the semantics is well-defined,
but I find it extremely counter-intuitive, given that the two sets were
created from the same string literals.  In real life it might not be
obvious that the insertions happened in different orders, and anyone
depending on equality of sets might get some surprising results.  (To make
life more interesting, imagine that the insertions of "Ada" and the
insertions of "ADA" happen on different tasks, with proper
synchronization, but in a non-deterministic order.)

Of course, the same problem may happen with the 4-parameter Insert (except
that in this case the first insertion, not the last one, wins).

> Nor do I understand how you 
> could get rid of set equality if the set type is non-limited.

This is water under the bridge now, but I think that "=" should have been
a mere renaming of Equivalent_Sets.

****************************************************************

From: Tucker Taft
Date: Friday, January 26, 2007 10:05 AM

> This is water under the bridge now, but I think that "=" should have been
> a mere renaming of Equivalent_Sets.

Your points above are valid, but I don't agree with your conclusion.
If two sets are equal, then I would hope there are essentially
indistinguishable.  But these are clearly distinguishable,
because if you iterate through the two sets, you get different
things.  If we wanted to achieve your implied goal that,
if you insert the same strings, but in different orders, you
end up with the same result, we would need to provide another
formal function that would allow the implementation to compare
two equivalent elements and decide which one to put in the set.
But we clearly don't have that.  So I think the distinction
between Equivalent_Elements and element "=" is directly analogous
to the Equivalent_Sets and set "=", and so I think (once we
make it clear that equal-element should imply equivalent-element),
that we have the right semantics here.

****************************************************************

From: Matthew Heaney
Date: Friday, January 26, 2007 10:22 PM

> An example might help: consider a set of strings, and assume that the "<"
> is case-insensitive lexicographic comparison (so equivalence is
> case-insensitive equality).  Equality of strings is, of course,
> case-sensitive.

Yes, the predefined equality operator for type String is case sensitive, 
but if you have a case insensitive less-than operator then you have to 
match it with a case insensitive equality operator in the instantiation 
(if case-insensitive set equality is what you want).

> Now consider the insertions:
> 
> 	Include (S1, "Ada");
> 	Include (S1, "ADA");
> 	Include (S2, "ADA");
> 	Include (S2, "Ada");
> 
> After these calls you have S1={"ADA"} and S2={"Ada"} so S1 and S2 are
> equivalent (fine) but not equal.  

But it depends on the generic actual equality operator for element type. 
  The fact that it's type String doesn't make any difference.

> Surely the semantics is well-defined,
> but I find it extremely counter-intuitive, given that the two sets were
> created from the same string literals.  

But if the generic actual equality operator is case sensitive, then the 
sets are going to compare unequal.  Why do you think that that's 
counter-intuitive?

If you want the sets to compare equal, then you have to provide a 
generic actual equality operator that's case insensitive.

In GNAT, I augmented then container library with the following functions:

Ada.Strings.Less_Case_Insensitive
Ada.Strings.Equal_Case_Insensitive
Ada.Strings.Hash_Case_Insensitive

If you want string sets to compare equal, then you simply have to say:

with Ada.Containers.Indefinite_Ordered_Sets;
with Ada.Containers.Less_Case_Insensitive;
with Ada.Containers.Equal_Case_Insensitive;

package String_Sets is
   new Ada.Containers.Indefinite_Ordered_Sets
     (String,
      "<" => Ada.Containers.Less_Case_Insensitive,
      "=" => Ada.Containers.Equal_Case_Insensitive);

and now all is well.

> In real life it might not be
> obvious that the insertions happened in different orders, and anyone
> depending on equality of sets might get some surprising results.  

If the general actual equality operator is case sensitive, then there is 
no surprise: the sets compare unequal.

If this would cause surprise, then you solve that problem by supplying a 
case insensitive element equality operator too.

This issue is exactly why we have a generic formal element equality 
operator.  It's up to the developer to supply a generic actual that does 
not cause surprise (where "surprise" is defined by him).

>> Nor do I understand how you 
>> could get rid of set equality if the set type is non-limited.
> 
> This is water under the bridge now, but I think that "=" should have been
> a mere renaming of Equivalent_Sets.

I disagree.  I think we made exactly the right decision by declaring a 
generic formal equality operator for the element type, and leaving it up 
to the developer to pass in (or not) whatever generic actual makes sense 
for his application.

****************************************************************

From: Pascal Leroy
Date: Monday, January 29, 2007  2:26 AM

> If you want string sets to compare equal, then you simply have to say:
> 
> package String_Sets is
>    new Ada.Containers.Indefinite_Ordered_Sets
>      (String,
>       "<" => Ada.Containers.Less_Case_Insensitive,
>       "=" => Ada.Containers.Equal_Case_Insensitive);

If you believe that the "=" and "<" should be consistent, then:

1 - Why the hell do we require that the user pass in "=" when we can
deduce equivalence from the given "<"?

2 - This is a requirement on the user, and it should be spelled out in the
RM.

We agreed earlier that equality should imply equivalence.  My
understanding is that you are now saying that equality should be the same
as equivalence, otherwise you'll have dependencies on the order of
insertion.

****************************************************************

From: Pascal Leroy
Date: Monday, January 29, 2007  2:32 AM

> If two sets are equal, then I would hope there 
> are essentially indistinguishable.  But these are clearly 
> distinguishable, because if you iterate through the two sets, 
> you get different things.

Yes, you can always distinguish sets through iterations, but that doesn't
bother me too much, since you are in some sense "dissecting" the set.  In
general, the operations of this package that deal with entire sets (not
doing any dissection) rely on element equivalence, not element equality.
This makes it possible to preserve the natural mathematical invariants,
for instance the fact that the intersection is always a subset of the
union.

On the other hand, when you start using "=", all bets are off regarding
these mathematical invariants.  The fact that two sets are equal, for
instance, doesn't mean that their symmetric difference is empty.  I think
this is treacherous.

> If we wanted to achieve your 
> implied goal that, if you insert the same strings, but in 
> different orders, you end up with the same result, we would 
> need to provide another formal function that would allow the 
> implementation to compare two equivalent elements and decide 
> which one to put in the set. But we clearly don't have that.  

That's not what I am trying to achieve.  It would not be very practical,
because deciding how to canonicalize the set could be hard.  But I think
that, as a set-theoretical operation, "=" is badly broken.

> So I think the distinction between Equivalent_Elements and 
> element "=" is directly analogous to the Equivalent_Sets and 
> set "=", and so I think (once we make it clear that 
> equal-element should imply equivalent-element), that we have 
> the right semantics here.

I guess that discussions in the abstract don't go anywhere.  I would like
to see a practical example where the distinction between "=" and
Equivalent_Sets is actually useful.

We are obviously not going to change the specs of these packages now.  But
I would like to have a note warning the user that, if element equality and
element equivalence are not the same, the result of set equality will in
general depend on the sequence of calls used to create each set.

****************************************************************

From: Matthew Heaney
Date: Monday, January 29, 2007  9:21 AM

> If you believe that the "=" and "<" should be consistent, then:

They should be consistent if you want set equality to imply set equivalence.

> 1 - Why the hell do we require that the user pass in "=" when we can
> deduce equivalence from the given "<"?

I don't understand this.  You can deduce equivalence from "<", but you 
cannot deduce equality, so you have to pass in equality too.

> 2 - This is a requirement on the user, and it should be spelled out in the
> RM.

The only requirement on the user is that if they want set equality to 
imply set equivalence, then the user is required to pass in an equality 
operator such that element equality implies element equivalence.

> We agreed earlier that equality should imply equivalence.  

Yes, but only if set equality is to imply set equivalence.  The user 
decides whether this is true.  (And if he decides in the affirmative, 
then he has the obligation to pass in a suitable element equality operator.)

> My
> understanding is that you are now saying that equality should be the same
> as equivalence, otherwise you'll have dependencies on the order of
> insertion.

No, that's not what I'm saying at all.  I'm saying that if set equality 
is to imply set equivalence, then element equality must imply element 
equivalence.  That's very different from saying that "equality should be 
the same as equivalence."

****************************************************************

From: Niklas Holsti
Date: Monday, January 29, 2007 12:39 PM

>> 2 - This is a requirement on the user, and it should be spelled out in 
>> the RM.
> 
> 
> The only requirement on the user is that if they want set equality to 
> imply set equivalence, then the user is required to pass in an equality 
> operator such that element equality implies element equivalence.

More than that: if element equality does not imply element equivalence, 
the fast algorithm to test equality of sets is wrong because it compares 
only equivalent elements; an O(N**2) algorithm would be needed.

****************************************************************

From: Tucker Taft
Date: Monday, January 29, 2007  1:37 PM

Before we get too confused by the philosophical discussion,
I think we all now agree that we *do* need to make a
substantive change in the RM wording to state that:

   if element "=" does *not* imply element equivalence then
   the result of set-wise "=" is not specified.

This is required to allow implementations to compare only
equivalent elements to determine the result of set-wise "=".

****************************************************************

From: Pascal Leroy
Date: Tuesday, July 17, 2007  2:58 AM

The last bit of wording in this AI [version /01 - ED] has:

Add "If the actual for the generic formal function "=" does not return
True for any pair of equivalent elements, then the behavior of the
container function "=" is unspecified."
to A.18.8(65.2) and A.18.9(79/2).

This is wrong.  As explained in the email thread of the AI, the
requirement ought to be that equality implies equivalence, not the other
way around.  Think of a set of strings: equality could be normal string
equality, and equivalence could be based on case-insensitive comparison.
This is fine, as it makes it possible to implement set equality in linear
time, but it fails the above rule.

The wording should be changed to:

Add "If the actual for the generic formal function "=" returns True for
any pair of non-equivalent elements, then the behavior of the container
function "=" is unspecified."
to A.18.8(65.2) and A.18.9(79/2).

****************************************************************


Questions? Ask the ACAA Technical Agent