!standard A.18(4/2) 07-04-02 AI05-0044-1/01
!standard A.18.2(231/2)
!standard A.18.3(145/2)
!standard A.18.6(57/2)
!standard A.18.8(65/2)
!standard A.18.9(79/2)
!standard A.18.16(5/2)
!standard A.18.16(9/2)
!class binding interpretation 07-04-02
!status work item 07-04-02
!status received 07-01-24
!priority Medium
!difficulty Medium
!qualifier Omission
!subject Equivalence and equality in containers
!summary
The notion of "strict weak ordering" is defined. All "<" formal subprograms are
required to have a strict weak ordering.
"Sorted smallest first" also is defined.
"=" formal functions for the Set containers are required to return True for equivalent
elements.
!question
The equivalence relationship of ordered containers is assumed to be transitive. However,
that does not follow from the requirements on "<" and the definition of equivalence.
Consider the following function:
type T1 is (X, Y, Z);
function "<" (Left, Right : T1) return Boolean is
-- This function implements the ordering:
-- X < Z
-- No other pairs are comparable.
begin
return Left = X and Right = Z;
end "<";
This function meets the requirements on "<". But the equivalence relationship derived from
it is not transitive. This could cause many bizarre effects in a container.
Similar issues arise for "<" as passed to various sorting routines.
If "<" allows incomparable items, it's impossible to understand what the meaning of
"smallest first" is.
A related issue is that the requirements on "=" for sets are not related to equivalence. That
means that "=" on a set must take O(N**2), as checking only equivalent elements would be making
an unwarrented assumption.
!recommendation
(See Summary.)
!wording
Define "strict weak ordering" and "smallest first" after A.18(4/2):
A *strict weaking ordering* for an operator "<" is one that is be irreflexive,
asymmetric, and transitive. In addition, if x < y is True for any values x and y,
then for all other values z, (x < z) or (z < y) is True.
A container is sorted *smallest first* based on an operator "<" if, for any
two elements of a container, Elem1 < Elem2 implies that the position of Elem1
in the container is closer to the the first element than the position of Elem2.
Replace the second sentence of A.18.2(231/2) by
It should define a strict weak ordering relationship (see A.18); it should not modify
Container.
Make similar changes to A.18.3(145/2), A.18.6(57/2), A.18.9(79/2), A.18.16(5/2),
and A.18.16(9/2).
Add "If the actual for the generic formal function "=" does not return True for any
pair of equivalent elements, then the behavior of the container function "=" is unspecified."
to A.18.8(65.2) and A.18.9(79/2).
!discussion
A "<" actual function is required to be irreflexive, asymetric, and transitive. The
formal definitions of these terms are:
- irreflexive: X < X is False for all X.
- asymmetric: if X < Y is True, then Y < X is False.
- transitive: if X < Y is True and Y < Z is True then X < Z is True.
Note that the latter two say nothing about the case where X < Y is False. Thus
it is OK to have both X < Y = False and Y < X = False and still be asymeteric. Indeed, this
is the definition of equivalence, so it is clearly fine. What is not OK if for there to be
a value which has is equivalent to two other values which are not equivalent (that is,
that have "<" returning True).
That is, we also need transitively of equivalence, or more precisely.
if x < y is True, then for all z, (x < z) or (z < y) is True.
--!corrigendum 10.1.2(20/2)
!ACATS test
!appendix
From: Niklas Holsti
Date: Wednesday, January 24, 2007 8:11 AM
!topic Equivalent keys/elements in Ordered Maps and Sets
!reference Ada 2005 RMA.18.6(55/2)
!reference Ada 2005 RMA.18.9(78/2)
!from Niklas Holsti 07-01-24
!discussion
The generic container packages for Maps and Sets deal with a concept
called "equivalent keys" or "equivalent elements". This equivalence is
called an "equivalence relation" in A.18.4(5/2) for Maps and A.18.7(5/2)
for Sets.
As I understand it, and as Wikipedia defines it, an equivalence relation
is a reflexive, symmetric and transitive relation: any key or element X
is equivalent to itself; if X is equivalent to Y then Y is equivalent to
X; and if X and Y are equivalent and Y and Z are equivalent then X and Z
are equivalent.
(Wikipedia: http://en.wikipedia.org/wiki/Equivalence_relation)
The concept of equivalent keys or elements is defined differently for
Hashed and Ordered containers. The question discussed here arises only
for Ordered containers. The question is whether the "equivalence"
concept as defined for Ordered containers is certain to be transitive
and thus whether this "equivalence" really is an equivalence relation.
If it is not transitive then some container operations can give strange
results.
For Ordered_Maps (or Ordered_Sets) the RM defines two keys (or elements)
X and Y to be "equivalent" if they are incomparable, that is, if
X < Y is False and Y < X is False, using the actual comparison operator
associated with the generic formal "<" operator at instantiation time.
This is in A.18.6(55/2) and A.18.9(78/2), the paragraphs referenced in
the headers above.
For clarity and brevity I will denote this "equivalence" by the "~"
symbol. Thus the RM definition is:
(X ~ Y) if and only if (X < Y is False and Y < X is False)
The properties of the "~" relation thus depend on the properties of the
actual comparison operator associated with the formal "<" operator. I
will use the symbol "<" for this actual comparison operator too, as
there seems no risk of confusion between the two.
The properties required or expected of "<" are defined in A.18.6(56/2)
for Ordered_Maps and in A.18.9(79/2) for Ordered_Sets. In both cases it
is expected to be "a strict ordering relationship, that is, to be
irreflexive, asymmetric, and transitive". I understand this as follows,
using definitions from Wikipedia:
- irreflexive: X < X is False for all X.
http://en.wikipedia.org/wiki/Irreflexive
- asymmetric: if X < Y is True, then Y < X is False.
http://en.wikipedia.org/wiki/Asymmetric_relation
(This is the "stronger" definition, the "weaker" being only
"not symmetric". This form implies that "<" is irreflexive,
making the requirements on "<" partly redundant.)
- transitive: if X < Y is True and Y < Z is True then X < Z is True.
http://en.wikipedia.org/wiki/Transitive_relation
What do these properties of "<" imply for "~"?
- The "~" relation is obviously symmetric because "and" is
commutative. This does not depend on any properties of "<".
- The "~" relation is reflexive because "<" is irreflexive.
- The "~" relation is *not* necessarily transitive.
The last claim is supported by this example:
type T1 is (X, Y, Z);
function "<" (Left, Right : T1) return Boolean is
-- This function implements the ordering:
-- X < Z
-- No other pairs are comparable.
begin
return Left = X and Right = Z;
end "<";
This function "<" is a strict ordering relation for T1. But the "~"
relation induced by this "<" is not transitive: we have X ~ Y (because
both X < Y and Y < X are False) and Y ~ Z (because both Y < Z and Z < Y
are False) but not X ~ Z (because X < Z is True).
If the above reasoning is correct the conclusion is that the current RM
requirements on the actual "<" operator for an instantiation of
Ordered_Maps or Ordered_Keys are not strong enough to ensure that the
RM-defined "equivalence" concept for keys or elements is an equivalence
relation. This has three negative effects:
1. It confuses the reader to call something an "equivalence"
when it isn't an equivalence relation (because it is not
transitive).
2. As the effects of operations on Maps and Sets are defined in
terms of this "equivalence", the operations may give
strange (unexpected) results.
3. The definitions of some operations become ambiguous.
For an example of negative effect (2), strange results, consider the
function Equivalent_Sets when Ordered_Sets is instantiated with the
following element type and "<" operator:
type T2 is (W, X, Y, Z);
function "<" (Left, Right : T2) return Boolean is
-- This function implements the ordering:
-- W < X < Z
-- W < Y
-- No other pairs are comparable.
begin
case Left is
when W => return Right /= W;
when X => return Right = Z;
when Y => return False;
when Z => return False;
end case;
end "<";
The "equivalence" relation "~" derived from this "<" relation contains
exactly the following pairs (giving True) and no others (all other pairs
give False):
X ~ X, X ~ Y
Y ~ Y, Y ~ X, Y ~ Z
Z ~ Z, Z ~ Y
W ~ W.
Now consider two objects Left and Right of the type Set from this
instance of Ordered_Sets. Let the Left set be {X, Z} and the Right set
be {W, Y}. Note that X ~ Z is False, so the Left set is "legal" with
respect to A.18.6(5/2), and likewise for the Right set. Let's apply the
RM definition A.18.6(22/2) to see what Equivalent_Sets (Left, Right)
should return:
- Left and Right are not the same set object
so we do not return True at once.
- Left and Right have the same number of elements (2)
so we do not return False at once.
- Now we must check each element of Left and look for an
"equivalent" element in Right:
- Check X from Left: we have X ~ Y and Y is in Right.
- Check Z from Left: we have Z ~ Y and Y is in Right.
Thus Equivalent_Sets (Left, Right) returns True.
However, Right contains the element W, which is not "equivalent" to any
element in Left. I feel that this makes it strange to consider Left and
Right to be "equivalent sets". Moreover, it means that Equivalent_Sets
(Right, Left) will return False, showing that Equivalent_Sets is not
symmetric in this case, another strange result.
These strange results are possible because two different elements of the
Left set can be "equivalent" to the *same* element of the Right set,
leaving one or more elements of the Right set unpaired with any
"equivalent" elements of the Left set.
If the "~" relation is transitive, these strange results cannot happen,
which is easily proved as follows. If there are two different elements
L1 and L2 of Left and an element R of Right such that L1 ~ R and L2 ~ R
then by the symmetry and transitivity of "~" we must have L1 ~ L2 which
is impossible because no Set can contain two or more "equivalent"
elements according to A.18.6(5/2).
For an example of negative effect (3), ambiguous definitions, consider
the definition of the "=" operator on Ordered_Maps.Map in A.18.4(22/2)
and (23/2). This definition implicitly assumes that each key in the Left
map is "equivalent" to at most one key in the Right map. This assumption
is false if "equivalence" of keys is not transitive; then there may be
many "equivalent" keys in the Right map that could be matched against
the same key in the Left map, making this definition of "=" ambiguous.
To avoid these problem I think that the RM paragraphs that specify the
properties expected of the actual "<" operators should be extended to
say that "equivalence of keys/elements" must be (is expected to be)
transitive.
Thus, in A.18.6(56/2) the following sentence should be inserted after
the sentence that ends with "transitive": "It should behave in such a
manner that equivalence of keys is a transitive relation between keys."
In A18.9(79/2) the following sentence should be inserted after the
sentence that ends with "transitive": "It should behave in such a manner
that equivalence of elements is a transitive relation between elements."
This question does not arise for Hashed_Maps or Hashed_Sets because the
equivalence of keys or elements is there defined directly by the formal
functions Equivalent_Keys and Equivalent_Elements for which the RM
specifically states that transitivity is expected, as well as
reflexivity and symmetry.
****************************************************************
From: Tucker Taft
Date: Wednesday, January 24, 2007 1:51 PM
This seems like a valid point, though it seems unfortunate
to have to impose requirements on the "equivalence relation"
rather than directly on "<". I'm surprised there isn't
a more directly stated property of "<" that would ensure that
equiv(X,Y) == "not X
****************************************************************
From: Adam Beneschan
Date: Wednesday, January 24, 2007 3:19 PM
> Perhaps something like requiring that "<" satisfies
> the requirement that "X < Y and not Z < Y implies X < Z".
Well, the definition of "transitive" for equivalence relationships is
that
(X equiv Y) and (Y equiv Z) implies (X equiv Z)
so, using the definition of "equivalence" from the RM, this becomes
not (X < Y) and not (Y < X) and not (Y < Z) and not (Z < Y) implies
not (X < Z) and not (Z < X)
which is ugly, but I don't know that it's possible to come up with a
shorter version just in terms of "<".
If you want to introduce a new term "comparable", defined as "A and B
are comparable if either A** If we combine these two and regroup we get
>
> (not X implies (not X
> which is our desired transitivity rule. QED!
>
> Now I suppose this might be overspecifying,
Actually, I don't think so---it appears that the rule you formed is
equivalent to the transitivity rule.
I wrote a small program (in Ada, of course) that assumed there were
three possible elements, A, B, and C, and tested all 64 combinations
of A**** but the
> rule I have suggested seems something that any "reasonable"
> "<" operator ought to satisfy. In words it might
> be phrased that if X and Y are ordered with respect
> to one another, then Z must be ordered with at least
> one of them.
Sounds vaguely familiar :-)
> A reexpression of the requirement that
> more nearly matches these words would be:
>
> For any X,Y,Z in domain of "<": X < Y implies (X < Z or Z < Y)
Yep, I added that to my program, and it's equivalent to the other two.
I kind of like this last formulation since it's more symmetrical than
the first one, which IMHO would make it easier to understand and
apply.
****************************************************************
From: Randy Brukardt
Date: Wednesday, January 24, 2007 5:15 PM
> Also, I should mention that I learned all this set theory stuff some
> time ago, and I remember what an equivalence relation, equivalence
> classes, partial orderings, and total orderings are, plus some other
> related concepts; and I've never come across the term "strict weak
> ordering" before this discussion. So while my own personal experience
> isn't necessarily an indicator of anything, I do think we should
> consider how much the term is "common knowledge" before just dumping
> it in the RM with no explanation.
On c.l.a., somone claimed that the term is not commonly used.
In any case, our definition of mathematical terms is given in by the
document defined in 1.3 of the Standard. I tried the online version of it
(www.mathworld.com), and did not find any definition of "strict weak
ordering". That means (presuming I didn't screw up in some way) that we
can't use that term in the RM, as it is not defined by our reference. (Or
we'd have to define it ourselves.)
One of Tucker's fixes looks good enough for this, so I don't think we need
to bother with a definition of "strict weak ordering". (Aside: it would sure
be nice if we could specify such things in the generic contract, rather than
a mess of words. But I don't have any good idea as how to do that.)
****************************************************************
From: Simon Wright
Date: Wednesday, January 24, 2007 5:40 PM
> The SGI STL page has more info:
>
>
Also at http://en.wikipedia.org/wiki/Strict_weak_ordering -- the
Properties section is interesting, very close to Tucker's
suggestions. I think that the 4th bullet is the point at which
Nicholas's bizarre "<" fails to meet the test.
****************************************************************
From: Pascal Leroy
Date: Thursday, January 25, 2007 3:26 AM
Niklas said:
> The question discussed here arises only
> for Ordered containers. The question is whether the "equivalence"
> concept as defined for Ordered containers is certain to be transitive
> and thus whether this "equivalence" really is an equivalence
> relation.
Interesting. The requirements on the formal operators "=" and "<" were
nailed down fairly late in the game, and I suppose that we didn't do our
homework properly. In a private email date June 1st, 2005, I wrote "we
could say that the relation "not (K1 < K2) and not (K2 < K1)" has to be an
equivalence" but apparently we didn't opt for that formulation (I suppose
that we thought that "strict ordering" was sufficient).
For an implementation based on a binary tree to work the "equivalence"
really has to be an equivalence relation, otherwise the result that you
get depends on the order of insertions (because it depends on the
representative that you choose to store in a node).
Tuck replied:
> A reexpression of the requirement that
> more nearly matches these words would be:
>
> For any X,Y,Z in domain of "<": X < Y implies (X < Z or Z < Y)
As Simon noticed this exactly matches the notion of "transitivity of
equivalence" as defined by the Wikipedia article on strict weak ordering,
so it does look like the way to go.
Randy commented:
> In any case, our definition of mathematical terms is given in
> by the document defined in 1.3 of the Standard. I tried the
> online version of it (www.mathworld.com), and did not find
> any definition of "strict weak ordering". That means
> (presuming I didn't screw up in some way) that we can't use
> that term in the RM, as it is not defined by our reference.
> (Or we'd have to define it ourselves.)
Well, in practice we define the term "strict ordering" because we write
"it should define a strict ordering relationship, that is, be irreflexive,
asymmetric, and transitive". But we rely on our reference document to
define "irreflexive", "asymmetric", and "transitive".
In A.18(3-4) we give a definition of the big-O notation. It would be
sensible to append there a definition of "strict weak ordering" and use
that term whenever we describe the requirements on "<", with a
cross-reference "see A.18".
****************************************************************
From: Christoph Grein
Date: Thursday, January 25, 2007 6:47 AM
I'm not sure where to put the following definition, but I think it should be included
in the RM.
A has the following properties. For all x, y and z:
For all x, it is not the case that x < x (irreflexivity).
For all x /= y, if x < y then it is not the case that y < x (antisymmetry).
For all x, y, and z, if x < y and y < z then x < z (transitivity).
For all x, y, and z, if x is incomparable with y, and y is incomparable with z,
then x is incomparable with z (transitivity of equivalence).
****************************************************************
From: Niklas Holsti
Date: Monday, January 29, 2007 3:40 PM
>>A reexpression of the requirement that
>>more nearly matches these words would be:
>>
>> For any X,Y,Z in domain of "<": X < Y implies (X < Z or Z < Y)
>
>
> As Simon noticed this exactly matches the notion of "transitivity of
> equivalence" as defined by the Wikipedia article on strict weak ordering,
> so it does look like the way to go.
Expanding the area of discussion a bit, should this requirement for
"weak" ordering, or transivity of equivalence, also apply to the generic
formal "<" operators of Ada.Containers.Vectors.Generic_Sorting,
Ada.Containers.Doubly_Linked_Lists.Generic_Sorting and the generic array
sorting procedures in A.18.16?
I think it should. I tried to understand what the RM means by "sorted
smallest first" in these places, but my attempts at writing this in
terms of "<" between adjacent elements suggest that incomparability
should be transitive here, too. Moreover, googling "Quicksort strict
weak" locates several pages that require "strict weak ordering" for
Quicksort.
Whatever the answer, I suggest that the RM should define what is meant
by "sorted smallest first". It is not obvious, if the definition can use
only the "<" operator and the container can contain incomparable elements.
****************************************************************
From: Tucker Taft
Date: Monday, January 29, 2007 5:36 PM
Yes, I think you are right that the additional
requirement on "<" should apply in these additional
places. Presumably the definition of being sorted
smallest first is that for the final "Sorted" array:
for I,J indices into Sorted:
I < J implies not(Sorted(J) < Sorted(I))
****************************************************************
From: Niklas Holsti
Date: Tuesday, January 30, 2007 2:54 AM
> Yes, I think you are right that the additional
> requirement on "<" should apply in these additional
> places.
Good.
> Presumably the definition of being sorted
> smallest first is that for the final "Sorted" array:
>
> for I,J indices into Sorted:
>
> I < J implies not(Sorted(J) < Sorted(I))
Yes, that is one possible formulation; it describes the Sorted array as
"globally non-decreasing".
An equivalent one, which I prefer because it has no negation and because
it is closer to the words "smallest first" is
for I,J indices into Sorted:
Sorted(I) < Sorted(J) implies I < J
If "<"-equivalence (incomparability) is transitive, it is enough to
compare adjacent elements and check that Sorted(I+1) < Sorted(I) is
False for all indices I in First .. Last-1. This form describes the
Sorted array as "locally non-decreasing". Using this form in the
function Is_Sorted gives O(N) complexity, as one would expect for that
function.
****************************************************************
From: Tucker Taft
Date: Tuesday, January 30, 2007 1:36 PM
> An equivalent one, which I prefer because it has no negation and because
> it is closer to the words "smallest first" is
>
> for I,J indices into Sorted:
>
> Sorted(I) < Sorted(J) implies I < J
That does seem more natural, and would seem to be a worthwhile
addition to the manual as a definition of "smallest first."
****************************************************************
From: Niklas Holsti
Date: Wednesday, January 24, 2007 2:00 PM
!topic Formal "=" for Element_Type in Set containers
!reference Ada 2005 RMA.18.4(3/2)
!reference Ada 2005 RMA.18.7(3/2)
!from Niklas Holsti 07-01-24
!discussion
All generic packages in Ada.Containers for Maps and Sets take a generic
formal "=" operator for the Element_Type. The actual "=" operator for
Element_Type, supplied for the instantiation, is expected to be
deterministic, reflexive and symmetric. This is stated in A.18.4(3/2)
for Maps and in A.18.7(3/2) for Sets.
The Map packages have no other generic formal functions taking
Element_Type values, thus the question to be discussed here does not
arise for Maps.
The Set packages have other generic formal functions taking Element_Type
parameters: Equivalent_Elements and Hash (for Hashed_Sets) and "<" (for
Ordered_Sets). The question I want to raise here is the puzzling fact
that the RM apparently requires and expects no relationship between the
"=" operator on Element_Type and the functions Equivalent_Elements, Hash
and "<".
For example, in the case of two elements E1 and E2 in an Ordered_Set it
would be very natural to require that if E1 "=" E2 then we cannot also
have E1 "<" E2 or E2 "<" E1 (in other words E1 should be "equivalent" to
E2, using the definition of equivalent elements given in A.18.9(78/2)).
But I can find no RM text in A.18 that explicitly says this.
Apart from the RM paragraphs referenced above, I found only the
following mentions of the "=" operator on Element_Type for Sets:
- In A.18.7(20/2), which says that "=" on Set objects uses the
formal "=" on Element_Type. (The Annotated RM says that this
is the only place where "=" for Element_Type should be used.)
- In A.18.9(80/2), which says that elements should not be changed
so that the "=" operator gives a different result (except by
operations in the Sets package itself).
Neither of these paragraphs specifies any relationship between the "="
operator on Element_Type and the other formal functions with
Element_Type parameters.
The "=" operator on Sets is defined in A.18.7(20/2). This definition
requires each element of the Left set to be compared for "=" with the
elements of the Right set, using the formal "=" operator on
Element_Type. If the RM really puts no constraints on the behaviour of
"=" on Element_Type with respect to the functions Equivalent_Elements,
Hash and "<" on Element_Type, two problems arise with this definition of
"=" on Sets:
1. I don't see how the "=" operator on Sets could be implemented in less
than O(N**2) time, because the implementation could not use the Hashed
or Ordered structure to limit the number of Element_Type pairs from Left
and Right that must be compared with "=".
2. The "=" operator on Sets could return True for two sets Left and
Right although the Right set contains elements that are not "=" to any
elements of the Left set. This is because the definition is asymmetric
(it "traverses" Left and "searches" Right), so two different elements of
Left could be "=" to the same element of Right, leaving some elements of
Right unmatched with elements of Left.
I think (and discussion on comp.lang.ada supports it) that the
definition of "=" on Sets expects the implementation to compare only
"equivalent" elements of the Left and Right Sets. If this "equivalence"
is transitive (see my earlier comment on the topic "Equivalent
keys/elements in Ordered Sets and Maps") this means that it is enough to
compare each element of Left with "=" to exactly one element of Right
(or, for Hashed_Sets, to the elements in one bucket of Right). This
makes the "=" operator on Sets take O(N) time, as desired.
As a parenthesis, from discussion on comp.lang.ada it occurs to me that
the RM authors may have intended to specify relationships between the
"=" operator on Element_Type and the other formal functions with
Element_Type parameters *indirectly*, through the properties expected of
these other formal functions, as follows:
For Hashed_Sets A.18.8(66/2) says that the actual Equivalent_Sets
function is expected to be reflexive. According to Wikipedia, a
(mathematical) relation "~" is reflexive if E ~ E for all E, thus on the
Ada level I understand this RM paragraph to expect only that
Equivalent_Sets (E, E) is True for all elements E. Obviously this does
not involve the "=" operator on Element_Type.
However, a *mathematically* equivalent definition of a reflexive
relation "~" is that E1 = E2 implies E1 ~ E2, for all E1 and E2. Perhaps
RM A.18.8(66/2) intends a similar formulation on the Ada level, that is:
for any two elements E1 and E2, if E1 = E2 (using the formal "="
operator) then Equivalent_Elements (E1, E2) is expected to be True. This
would define a natural relationship between the actual "=" for
Element_Type and the actual function for Equivalent_Elements.
The two mathematical definitions of "reflexive relation" are logically
equivalent because mathematics is referentially transparent (I hope I am
using this term more or less correctly): if E1 = E2 then E1 can be
substituted for E2 in any expression that uses E2 (as a free variable)
with no change in the value of the expression.
The two Ada-level definitions of the reflexivity of Equivalent_Elements
are not logically equivalent because Ada expressions are not
referentially transparent, neither with respect to the predefined "="
operators (for example, E1 = E2 does not imply E1'Address = E2'Address)
nor with respect to user-defined "=" operators.
To continue in the same way, for Ordered_Sets A.18.9(79/2) says that the
formal "<" function on Element_Type is expected to be irreflexive.
Following the normal mathematical definition this means on the Ada-level
that E < E is False for all E. As for Hashed_Sets a mathematically
equivalent reformulation can translate this into the Ada-level
expectation that E1 < E2 implies that E1 = E2 is False, using the formal
"<" and "=" operators. Again this would define a natural relationship
between the actual "=" and "<" operators for Element_Type.
The RM authors may not have intended these alternative Ada-level
formulations of "reflexive" and "irreflexive" properties, so I will
close this parenthesis here and proceed to my recommendation.
To enforce or clarify the expected relationship between "=" on
Element_Type and the other formal functions taking Element_Type
parameters, I suggest to add a sentence in A.18.7 saying that the actual
function for the generic function "=" on Element_Type is expected to
return True only when its Left and Right parameters are equivalent
elements in the equivalence relation introduced in paragraph
A.18.7(5/2). In other words, if E1 = E2 is True then E1 and E2 are
expected to be equivalent.
For Hashed_Sets this means that if E1 = E2 is True then so is
Equivalent_Elements (E1, E2), which (by existing rules) means that E1
and E2 have the same Hash value.
For Ordered_Sets this means that if E1 = E2 is True then E1 < E2 is
False and E2 < E1 is also False.
It may be better to reorder the paragraphs in this area so that the new
sentence can be put in the same paragraph with the sentences in the
current paragraph (3/2) that define the other expected properties of the
"=" function. The equivalence relation should be introduced first,
followed by the expected properties of "=".
****************************************************************
From: Pascal Leroy
Date: Thursday, January 25, 2007 4:27 AM
> Neither of these paragraphs specifies any relationship
> between the "=" operator on Element_Type and the other formal functions
with
> Element_Type parameters.
At the time we realized that we needed to be very precise in specifying
the behavior of the formal functions that define the equivalence (Hash and
"<") because they are essential to the integrity of the container.
On the other hand, we thought that, since the formal "=" operator is just
used for "=" of sets and maps, it does not play any role in the integrity
of the container, and that you get what you get. The requirements that we
imposed on "=" only reflect the fact that we didn't want to constrain how
many times the implementation calls it and with what parameters.
> 1. I don't see how the "=" operator on Sets could be
> implemented in less
> than O(N**2) time
I agree that this is not acceptable.
> 2. The "=" operator on Sets could return True for two sets Left and
> Right although the Right set contains elements that are not
> "=" to any
> elements of the Left set. This is because the definition is
> asymmetric
I have never been in love with this asymmetric definition. I suppose that
we could fix the definition, but that would leave us with problem (1)
above. At any rate, the definition makes it clear that the intent was
"scan one set, do a lookup in the other, compare the elements".
> the RM authors may have intended to specify relationships between the
> "=" operator on Element_Type and the other formal functions with
> Element_Type parameters *indirectly*
No, that was not the intent, we missed problem (1) above, but we were not
confused that the formal "=" operator is very different from mathematical
equality.
> I suggest to add a sentence in A.18.7 saying that the actual
> function for the generic function "=" on Element_Type is expected to
> return True only when its Left and Right parameters are equivalent
> elements in the equivalence
I agree that this is the best fix.
Note however that "=" on sets is fraught with difficulties because its
result will depend on the representative of each equivalence class that is
actually stored in the set. If you ever insert two equivalent elements in
a given set, you are treading on thin ice. Two sets that only differ
because they were created in different orders (but with the same elements)
are unlikely to compare equal. Heck, if an implementation internally
introduces some amount of randomization (perhaps due to storage
management) two sets created in the same order might even compare unequal.
So this strange function can be trusted when it return True, not when it
returns False. Hmm.
****************************************************************
From: Matthew Heaney
Date: Thursday, January 25, 2007 12:21 PM
>> 1. I don't see how the "=" operator on Sets could be
>> implemented in less
>> than O(N**2) time
>
> I agree that this is not acceptable.
O(n) is the worst that "=" could ever do. In practice you'll do better
since when equality for elements compares false you terminate the
iteration immediately.
> Note however that "=" on sets is fraught with difficulties because its
> result will depend on the representative of each equivalence class that is
> actually stored in the set. If you ever insert two equivalent elements in
> a given set, you are treading on thin ice.
But the second insertion will be rejected (since an equivalent element
is already in the set), so there is no issue.
****************************************************************
From: Matthew Heaney
Date: Thursday, January 25, 2007 1:18 PM
> The Set packages have other generic formal functions taking Element_Type
> parameters: Equivalent_Elements and Hash (for Hashed_Sets) and "<" (for
> Ordered_Sets). The question I want to raise here is the puzzling fact
> that the RM apparently requires and expects no relationship between the
> "=" operator on Element_Type and the functions Equivalent_Elements, Hash
> and "<".
No, this is incorrect. There certainly *is* an expectation that
equality implies equivalence (even if this is not stated explicitly).
> For example, in the case of two elements E1 and E2 in an Ordered_Set it
> would be very natural to require that if E1 "=" E2 then we cannot also
> have E1 "<" E2 or E2 "<" E1 (in other words E1 should be "equivalent" to
> E2, using the definition of equivalent elements given in A.18.9(78/2)).
> But I can find no RM text in A.18 that explicitly says this.
Equality for sets works by iterating over the left set, and searching
for each element in the right set. If no equivalent element is found
then the operation terminates immediately with False as the result. If
the element is found (meaning that the elements are equivalent), then
the elements are compared for equality. If element equality returns
False, then the operation terminates immediately with the result False.
Otherwise it continues the iteration until you run out of elements.
Equality for sets certainly does NOT work by performing a linear search
for the element. That would be a truly useless operation.
(For the implementation of an ordered set, no search is necessary since
you can simply have a pair of cursors to walk both internal trees
simultaneously.)
> Apart from the RM paragraphs referenced above, I found only the
> following mentions of the "=" operator on Element_Type for Sets:
...
> Neither of these paragraphs specifies any relationship between the "="
> operator on Element_Type and the other formal functions with
> Element_Type parameters.
Equality should imply equivalence. If you define equality differently
then you might not get the result you want.
> 1. I don't see how the "=" operator on Sets could be implemented in less
> than O(N**2) time, because the implementation could not use the Hashed
> or Ordered structure to limit the number of Element_Type pairs from Left
> and Right that must be compared with "=".
No, this is all wrong. A non-linear search is first performed (meaning
that the elements are compared for equivalence using the canonical
mechanisms), and then a test for equality is performed.
You're probably confused because RM05 A.18.7 (20/2) says:
"Otherwise, for each element E in Left, the function returns False if an
element equal to E (using the generic formal equality operator) is not
present in Right."
But this does NOT mean that there's a linear search for an equal item.
It DOES mean that there's a non-linear search for an equivalent item,
followed by a test for equality.
> 2. The "=" operator on Sets could return True for two sets Left and
> Right although the Right set contains elements that are not "=" to any
> elements of the Left set. This is because the definition is asymmetric
> (it "traverses" Left and "searches" Right), so two different elements of
> Left could be "=" to the same element of Right, leaving some elements of
> Right unmatched with elements of Left.
You're confused because you seem to think that element equality is used
to implement the search. But that's wrong: element equality is only
used *after* the search (which finds an equivalent item).
> However, a *mathematically* equivalent definition of a reflexive
> relation "~" is that E1 = E2 implies E1 ~ E2, for all E1 and E2. Perhaps
> RM A.18.8(66/2) intends a similar formulation on the Ada level, that is:
> for any two elements E1 and E2, if E1 = E2 (using the formal "="
> operator) then Equivalent_Elements (E1, E2) is expected to be True. This
> would define a natural relationship between the actual "=" for
> Element_Type and the actual function for Equivalent_Elements.
Yes, that's correct. If element equality returns True, then
Equivalent_Elements should return True.
Of course, if you define element equality differently, then equality on
sets might not return the value you want.
If you want set equality to imply set equivalence, then you have to
define the generic actual for element equality such that it only returns
True when the elements are equivalent.
> To enforce or clarify the expected relationship between "=" on
> Element_Type and the other formal functions taking Element_Type
> parameters, I suggest to add a sentence in A.18.7 saying that the actual
> function for the generic function "=" on Element_Type is expected to
> return True only when its Left and Right parameters are equivalent
> elements in the equivalence relation introduced in paragraph
> A.18.7(5/2). In other words, if E1 = E2 is True then E1 and E2 are
> expected to be equivalent.
Why is this necessary? If element equality does not imply equivalence
then set equality will not imply set equivalence. What's the problem?
****************************************************************
From: Tucker Taft
Date: Thursday, January 25, 2007 5:46 PM
> Equality for sets works by iterating over the left set, and searching
> for each element in the right set. If no equivalent element is found
> then the operation terminates immediately with False as the result.
That isn't what the RM says in A.18.7(20). It makes no mention of
equivalence. If we want the above to be a legitimate
implementation, then we need to include a mention of
equivalence in A.18.7(20) or add a requirement that
"=" on elements implies equivalence or the result of "="
on sets is unspecified.
> Equality should imply equivalence. If you define equality differently
> then you might not get the result you want.
But the RM has to say that somewhere.
>> ...
>> To enforce or clarify the expected relationship between "=" on
>> Element_Type and the other formal functions taking Element_Type
>> parameters, I suggest to add a sentence in A.18.7 saying that the
>> actual function for the generic function "=" on Element_Type is
>> expected to return True only when its Left and Right parameters are
>> equivalent elements in the equivalence relation introduced in
>> paragraph A.18.7(5/2). In other words, if E1 = E2 is True then E1 and
>> E2 are expected to be equivalent.
>
> Why is this necessary? If element equality does not imply equivalence
> then set equality will not imply set equivalence. What's the problem?
The problem is that the definition of "=" on sets does not
mention equivalence, nor is there any mention of the relationship
between "=" on elements and equivalence, so you as an implementor
can't use equivalence to implement set "=". If you want to use
equivalence, then the RM needs to be augmented with words that
authorize that implementation.
So I agree with the original comment by Niklas, that we need to
add something to the RM wording in this section.
****************************************************************
From: Matthew Heaney
Date: Thursday, January 25, 2007 9:13 PM
> > Equality for sets works by iterating over the left set, and searching
> > for each element in the right set. If no equivalent element is found
> > then the operation terminates immediately with False as the result.
>
> That isn't what the RM says in A.18.7(20). It makes no
> mention of equivalence. If we want the above to be a
> legitimate implementation, then we need to include a mention
> of equivalence in A.18.7(20) or add a requirement that "=" on
> elements implies equivalence or the result of "="
> on sets is unspecified.
OK, I'm certainly in favor of the RM requiring (or otherwise mentioning)
that element equality must imply element equivalence. The explicit
requirement would allow me to put some assertions in GNAT to verify the
implication. (In GNAT if you compile a container instantiation with
assertions enabled, you get a bunch of extra checks. That's how I detect
dangling cursors, for example.)
> > Equality should imply equivalence. If you define equality differently
> > then you might not get the result you want.
>
> But the RM has to say that somewhere.
OK.
> > Why is this necessary? If element equality does not imply equivalence
> > then set equality will not imply set equivalence. What's the problem?
>
> The problem is that the definition of "=" on sets does not
> mention equivalence, nor is there any mention of the
> relationship between "=" on elements and equivalence, so you
> as an implementor can't use equivalence to implement set "=".
OK.
****************************************************************
From: Pascal Leroy
Date: Friday, January 26, 2007 1:17 AM
> >> 1. I don't see how the "=" operator on Sets could be
> >> implemented in less
> >> than O(N**2) time
> >
> > I agree that this is not acceptable.
>
> O(n) is the worst that "=" could ever do.
Duh? For an ordered set I don't see how you can do better than
O(n*Log(n)) since the search is going to require O(Log(n)) comparisons
anyway.
> In practice you'll do better
> since when equality for elements compares false you terminate the
> iteration immediately.
Matt, this is irrelevant to the asymptotic complexity. It doesn't help
that in some cases you'll stop after the first comparison. If the sets
compare equal surely you'll have to check all the elements. And if they
don't, on average you'll have to check n/2 elements.
> > Note however that "=" on sets is fraught with difficulties because its
> > result will depend on the representative of each equivalence class
> > that is actually stored in the set. If you ever insert two equivalent
> > elements in a given set, you are treading on thin ice.
>
> But the second insertion will be rejected (since an equivalent element
> is already in the set), so there is no issue.
Not if you call Include or the 4-parameter Insert to add the element to
the set.
The scenario is this: you have a collection of elements, and you add them
to two sets, perhaps using Include, in different order. Chances are that
the resulting sets will not compare equal, even though they come from the
same elements. (They will be equivalent, though.)
Because of this counter-intuitive semantics, I believe that "=" on sets is
going to lead to bugs in user code. Most people won't realize that they
want to use Equivalent_Sets, not "=", to get a reliable answer. The "="
operator as defined is junk and should not have been included in the
package. Sigh.
****************************************************************
From: Matthew Heaney
Date: Friday, January 26, 2007 7:57 AM
> Duh? For an ordered set I don't see how you can do better than
> O(n*Log(n)) since the search is going to require O(Log(n))
> comparisons anyway.
Yes, you're right. I was wearing my implementation hat, since for an
ordered set you can simply do an pairwise comparison; no O(log(n)) search is
necessary.
> > In practice you'll do better
> > since when equality for elements compares false you terminate the
> > iteration immediately.
>
> Matt, this is irrelevant to the asymptotic complexity. It
> doesn't help that in some cases you'll stop after the first
> comparison. If the sets compare equal surely you'll have to
> check all the elements. And if they don't, on average you'll
> have to check n/2 elements.
Right.
> > But the second insertion will be rejected (since an equivalent element
> > is already in the set), so there is no issue.
>
> Not if you call Include or the 4-parameter Insert to add the
> element to the set.
I still don't see this. The 4-param Insert doesn't change anything, it
simply reports whether the insertion succeeds using a Boolean instead of an
exception.
> The scenario is this: you have a collection of elements, and
> you add them to two sets, perhaps using Include, in different
> order. Chances are that the resulting sets will not compare
> equal, even though they come from the same elements. (They
> will be equivalent, though.)
But wouldn't this only be true for a multi-set (and only if you don't
specify where the new item goes relative to existing equivalent items)? In
our case (a unique-element set), if an equivalent item is already in the
set, then the insert will fail. (Unless you're talking about Include, but
then the new item would replace what's there, so the same element will still
be in both sets.)
> Because of this counter-intuitive semantics, I believe that
> "=" on sets is going to lead to bugs in user code. Most
> people won't realize that they want to use Equivalent_Sets,
> not "=", to get a reliable answer. The "="
> operator as defined is junk and should not have been included
> in the package. Sigh.
I still don't understand this. Nor do I understand how you could get rid of
set equality if the set type is non-limited.
****************************************************************
From: Pascal Leroy
Date: Friday, January 26, 2007 8:31 AM
> I still don't see this. The 4-param Insert doesn't change
> anything, it simply reports whether the insertion succeeds
> using a Boolean instead of an exception.
>
> But wouldn't this only be true for a multi-set (and only if
> you don't specify where the new item goes relative to
> existing equivalent items)? In our case (a unique-element
> set), if an equivalent item is already in the set, then the
> insert will fail. (Unless you're talking about Include, but
> then the new item would replace what's there, so the same
> element will still be in both sets.)
An example might help: consider a set of strings, and assume that the "<"
is case-insensitive lexicographic comparison (so equivalence is
case-insensitive equality). Equality of strings is, of course,
case-sensitive.
Now consider the insertions:
Include (S1, "Ada");
Include (S1, "ADA");
Include (S2, "ADA");
Include (S2, "Ada");
After these calls you have S1={"ADA"} and S2={"Ada"} so S1 and S2 are
equivalent (fine) but not equal. Surely the semantics is well-defined,
but I find it extremely counter-intuitive, given that the two sets were
created from the same string literals. In real life it might not be
obvious that the insertions happened in different orders, and anyone
depending on equality of sets might get some surprising results. (To make
life more interesting, imagine that the insertions of "Ada" and the
insertions of "ADA" happen on different tasks, with proper
synchronization, but in a non-deterministic order.)
Of course, the same problem may happen with the 4-parameter Insert (except
that in this case the first insertion, not the last one, wins).
> Nor do I understand how you
> could get rid of set equality if the set type is non-limited.
This is water under the bridge now, but I think that "=" should have been
a mere renaming of Equivalent_Sets.
****************************************************************
From: Tucker Taft
Date: Friday, January 26, 2007 10:05 AM
> This is water under the bridge now, but I think that "=" should have been
> a mere renaming of Equivalent_Sets.
Your points above are valid, but I don't agree with your conclusion.
If two sets are equal, then I would hope there are essentially
indistinguishable. But these are clearly distinguishable,
because if you iterate through the two sets, you get different
things. If we wanted to achieve your implied goal that,
if you insert the same strings, but in different orders, you
end up with the same result, we would need to provide another
formal function that would allow the implementation to compare
two equivalent elements and decide which one to put in the set.
But we clearly don't have that. So I think the distinction
between Equivalent_Elements and element "=" is directly analogous
to the Equivalent_Sets and set "=", and so I think (once we
make it clear that equal-element should imply equivalent-element),
that we have the right semantics here.
****************************************************************
From: Matthew Heaney
Date: Friday, January 26, 2007 10:22 PM
> An example might help: consider a set of strings, and assume that the "<"
> is case-insensitive lexicographic comparison (so equivalence is
> case-insensitive equality). Equality of strings is, of course,
> case-sensitive.
Yes, the predefined equality operator for type String is case sensitive,
but if you have a case insensitive less-than operator then you have to
match it with a case insensitive equality operator in the instantiation
(if case-insensitive set equality is what you want).
> Now consider the insertions:
>
> Include (S1, "Ada");
> Include (S1, "ADA");
> Include (S2, "ADA");
> Include (S2, "Ada");
>
> After these calls you have S1={"ADA"} and S2={"Ada"} so S1 and S2 are
> equivalent (fine) but not equal.
But it depends on the generic actual equality operator for element type.
The fact that it's type String doesn't make any difference.
> Surely the semantics is well-defined,
> but I find it extremely counter-intuitive, given that the two sets were
> created from the same string literals.
But if the generic actual equality operator is case sensitive, then the
sets are going to compare unequal. Why do you think that that's
counter-intuitive?
If you want the sets to compare equal, then you have to provide a
generic actual equality operator that's case insensitive.
In GNAT, I augmented then container library with the following functions:
Ada.Strings.Less_Case_Insensitive
Ada.Strings.Equal_Case_Insensitive
Ada.Strings.Hash_Case_Insensitive
If you want string sets to compare equal, then you simply have to say:
with Ada.Containers.Indefinite_Ordered_Sets;
with Ada.Containers.Less_Case_Insensitive;
with Ada.Containers.Equal_Case_Insensitive;
package String_Sets is
new Ada.Containers.Indefinite_Ordered_Sets
(String,
"<" => Ada.Containers.Less_Case_Insensitive,
"=" => Ada.Containers.Equal_Case_Insensitive);
and now all is well.
> In real life it might not be
> obvious that the insertions happened in different orders, and anyone
> depending on equality of sets might get some surprising results.
If the general actual equality operator is case sensitive, then there is
no surprise: the sets compare unequal.
If this would cause surprise, then you solve that problem by supplying a
case insensitive element equality operator too.
This issue is exactly why we have a generic formal element equality
operator. It's up to the developer to supply a generic actual that does
not cause surprise (where "surprise" is defined by him).
>> Nor do I understand how you
>> could get rid of set equality if the set type is non-limited.
>
> This is water under the bridge now, but I think that "=" should have been
> a mere renaming of Equivalent_Sets.
I disagree. I think we made exactly the right decision by declaring a
generic formal equality operator for the element type, and leaving it up
to the developer to pass in (or not) whatever generic actual makes sense
for his application.
****************************************************************
From: Pascal Leroy
Date: Monday, January 29, 2007 2:26 AM
> If you want string sets to compare equal, then you simply have to say:
>
> package String_Sets is
> new Ada.Containers.Indefinite_Ordered_Sets
> (String,
> "<" => Ada.Containers.Less_Case_Insensitive,
> "=" => Ada.Containers.Equal_Case_Insensitive);
If you believe that the "=" and "<" should be consistent, then:
1 - Why the hell do we require that the user pass in "=" when we can
deduce equivalence from the given "<"?
2 - This is a requirement on the user, and it should be spelled out in the
RM.
We agreed earlier that equality should imply equivalence. My
understanding is that you are now saying that equality should be the same
as equivalence, otherwise you'll have dependencies on the order of
insertion.
****************************************************************
From: Pascal Leroy
Date: Monday, January 29, 2007 2:32 AM
> If two sets are equal, then I would hope there
> are essentially indistinguishable. But these are clearly
> distinguishable, because if you iterate through the two sets,
> you get different things.
Yes, you can always distinguish sets through iterations, but that doesn't
bother me too much, since you are in some sense "dissecting" the set. In
general, the operations of this package that deal with entire sets (not
doing any dissection) rely on element equivalence, not element equality.
This makes it possible to preserve the natural mathematical invariants,
for instance the fact that the intersection is always a subset of the
union.
On the other hand, when you start using "=", all bets are off regarding
these mathematical invariants. The fact that two sets are equal, for
instance, doesn't mean that their symmetric difference is empty. I think
this is treacherous.
> If we wanted to achieve your
> implied goal that, if you insert the same strings, but in
> different orders, you end up with the same result, we would
> need to provide another formal function that would allow the
> implementation to compare two equivalent elements and decide
> which one to put in the set. But we clearly don't have that.
That's not what I am trying to achieve. It would not be very practical,
because deciding how to canonicalize the set could be hard. But I think
that, as a set-theoretical operation, "=" is badly broken.
> So I think the distinction between Equivalent_Elements and
> element "=" is directly analogous to the Equivalent_Sets and
> set "=", and so I think (once we make it clear that
> equal-element should imply equivalent-element), that we have
> the right semantics here.
I guess that discussions in the abstract don't go anywhere. I would like
to see a practical example where the distinction between "=" and
Equivalent_Sets is actually useful.
We are obviously not going to change the specs of these packages now. But
I would like to have a note warning the user that, if element equality and
element equivalence are not the same, the result of set equality will in
general depend on the sequence of calls used to create each set.
****************************************************************
From: Matthew Heaney
Date: Monday, January 29, 2007 9:21 AM
> If you believe that the "=" and "<" should be consistent, then:
They should be consistent if you want set equality to imply set equivalence.
> 1 - Why the hell do we require that the user pass in "=" when we can
> deduce equivalence from the given "<"?
I don't understand this. You can deduce equivalence from "<", but you
cannot deduce equality, so you have to pass in equality too.
> 2 - This is a requirement on the user, and it should be spelled out in the
> RM.
The only requirement on the user is that if they want set equality to
imply set equivalence, then the user is required to pass in an equality
operator such that element equality implies element equivalence.
> We agreed earlier that equality should imply equivalence.
Yes, but only if set equality is to imply set equivalence. The user
decides whether this is true. (And if he decides in the affirmative,
then he has the obligation to pass in a suitable element equality operator.)
> My
> understanding is that you are now saying that equality should be the same
> as equivalence, otherwise you'll have dependencies on the order of
> insertion.
No, that's not what I'm saying at all. I'm saying that if set equality
is to imply set equivalence, then element equality must imply element
equivalence. That's very different from saying that "equality should be
the same as equivalence."
****************************************************************
From: Niklas Holsti
Date: Monday, January 29, 2007 12:39 PM
>> 2 - This is a requirement on the user, and it should be spelled out in
>> the RM.
>
>
> The only requirement on the user is that if they want set equality to
> imply set equivalence, then the user is required to pass in an equality
> operator such that element equality implies element equivalence.
More than that: if element equality does not imply element equivalence,
the fast algorithm to test equality of sets is wrong because it compares
only equivalent elements; an O(N**2) algorithm would be needed.
****************************************************************
From: Tucker Taft
Date: Monday, January 29, 2007 1:37 PM
Before we get too confused by the philosophical discussion,
I think we all now agree that we *do* need to make a
substantive change in the RM wording to state that:
if element "=" does *not* imply element equivalence then
the result of set-wise "=" is not specified.
This is required to allow implementations to compare only
equivalent elements to determine the result of set-wise "=".
****************************************************************
**