!standard G.3.1 08-01-17 AI05-0047-1/04
!standard G.3.2
!class binding interpretation 07-04-04
!status work item 07-04-04
!status received 07-04-04
!priority Medium
!difficulty Easy
!qualifier Clarification
!subject Annoyances in the array packages
!summary
It is clarified that subcubic methods cannot be used for the implementation
of matrix multiplication using "*" in the strict mode.
Some corrections to the specification of the Eigensystem procedures are
made and AARM notes added confirming that there is no requirement on the
absolute direction of eigenvectors. The possibility of Constraint_Error
being raised is added.
Other AARM notes are added and a typographical error corrected.
!question
(These questions were raised by an implementor.)
1 - The multiplication of matrices is defined to "involve inner products"
(G.3.1(56/2)). In strict mode, this requires that each component of the
result comply with the requirements of G.3.1(83/2-84/2). Technically
this is known as a "componentwise" error bound. An algorithm that gives
a componentwise error bound is necessarily cubic (i.e., in O(N**3)).
There exist however algorithms for multiplying matrices that are sub-cubic
(e.g., Strassen's method) but these algorithms give normwise error bounds,
meaning that the error on one component may spill over other components.
Some variants of BLAS and other widely-used linear algebra libraries use
fast matrix multiplication algorithms, so the accuracy requirements make
it impossible to implement Generic_Real_Arrays by interfacing to these
libraries. Was this intended? (No.)
2 - The definition of Eigensystem does not impose any constraint on the length
of the out parameters Values and Vectors. Instead it has the mysterious
sentence "The index ranges of the parameter Vectors are those of A". It is
written as if Eigensystem had a way to change the constraints of Vectors,
which is evidently false. It would seem that it should require that
Values'Length, Vectors'Length(1) and Vectors'Length(2) be all equal and
equal to A'Length(1). Is this true? (Yes.)
3 - For some matrices, the QR iteration used to compute the eigenvalues will
just not converge (no deflation will happen). While it is always possible
to let it run forever, it is typical in this case to give up after some
number of iterations. The RM makes no provision for raising an exception.
The values computed for (some of) the eigenvalues may be really bogus.
Would it not be better to raise an exception than to return garbage to the
user? (Yes.)
4 - The index subtype for types Real_Vector and Real_Matrix is Integer.
Presumably this was intended to provide maximum flexibility in selecting
the index range. It has however unpleasant consequences. Consider:
Identity: constant Real_Matrix := ((1.0, 0.0), (0.0, 1.0));
Anyone trying to evaluate Identity'First - 1 won't like the result. Why
would anyone do that? Maybe to initialize a variable that will be used to
iterate through the rows or column of the matrix. This is sure to bite many
users, and using a slightly narrower index subtype would have been much
wiser. Do you agree? (Not entirely.)
5 - G.3.2(75/2, 76/2) defines a function "abs" that returns the Hermitian L2-
norm of a vector. However, the specification of this function is given as:
function "abs" (Right : Complex_Vector) return Complex;
The norm of a vector is always a (nonnegative) real number, so it doesn't
make much sense to return a complex number here. This function should
return a Real'Base. It was correct in AI95-00418, but it seems that this
was incorrectly merged into the RM and Amendment. Is this true? (Yes.)
6 - Section G.3.2 keeps talking about inner product, but never defines exactly
what is meant by this term. This is significant because in a complex vector
space the natural inner product is the Hermitian one, where the elements of
the second vector are conjugated. It is unclear if the function "*"
conjugates the elements of Right. ISO/IEC 13813 explicitly specified
that "no complex conjugation is performed". While the "* operator
defined by such a rule is not a true inner product, it is probably more
appropriate in practice as it makes the conjugations explicit in the
source: the user has to write X * Conjugate (Y) which mimics the
mathematical notation where conjugation is always made explicit. Would
a clarification be useful? (Yes.)
7 - The eigenvectors returned by Eigensystem in Generic_Real_Arrays
and Generic_Complex_Arrays are not well specified, and this has
unfortunate effects. One problem is that the result of these subprograms
is hard to test. A more serious problem is that different implementations
may return different eigenvectors, and this may cause portability
problems. Also, even in a single implementation, changes to the internal
algorithms, or to the code generated for the annex G generics, may cause
the result of Eigensystem to change. This non-determinism is unpleasant.
Consider first the real case. Given a set of orthonormal eigenvectors,
changing the signs of any subset of these vectors also result is a set of
orthonormal eigenvectors (this is easily seen geometrically: it amount to
changing the direction of the vectors in an orthonormal basis). It would
be nice to be more specific.
Consider now the complex case. Things are more nasty here because there
is much more freedom: multiplying each vector by a complex number of
modulus 1 leaves the set of eigenvectors orthonormal. Again, it would be
nice to lift the ambiguity.
Was this non-determinism intended? (Yes.)
!recommendation
(See Discussion.)
!wording
Modify G.3.1(78/2) as follows
... Constraint_Error is raised if A'Length(1) is not equal to A'Length(2)
{, or if Values'Range is not equal to A'Range(1), or if the}[. The] index
ranges of the parameter Vectors are {not equal to }those of A. ...
{Constraint_Error is also raised in implementation-defined circumstances
if the algorithm used does not converge quickly enough.}
Add after G.3.1(78/2)
AARM NOTE
There is no requirement on the absolute direction of the returned
eigenvectors. Thus they might be multiplied by -1. It is only the ratios
of the components that matter. This is standard practice.
Insert after AARM Note G.3.1(86.b/2)
Note moreover that the componentwise accuracy requirements are not met
by subcubic methods for matrix multiplication such as that devised by
Strassen. These methods which are typically used for the fast
multiplication of very large matrices (e.g. order > 1000) have normwise
accuracy properties. If it desired to use such methods then distinct
subprograms should be provided perhaps in a child package. See Section
22.2.2 in the above reference.
Insert after G.3.1(90/2)
An implementation should minimize the circumstances under which the
algorithm used for Eigenvalues and Eigensystem fails to converge.
AARM NOTE
J. H. Wilkinson is the acknowledged expert in this area. See for example
Wilkinson, J. H., and Reinsch, C. , Linear Algebra , vol II of Handbook
for Automatic Computation, Springer-Verlag.
Insert after G.3.2(56/2)
AARM NOTE
An inner product never involves implicit complex conjugation. If the
product of a vector with the conjugate of another (or the same) vector
is required then this has to be stated explicitly by writing for
example X * Conjugate(Y). This mimics the usual mathematical notation.
Modify G.3.2(75/2) as follows
function "abs" (Right : Complex_Vector) return [Complex]{Real'Base};
Modify G.3.2(146/2) as follows
... Constraint_Error is raised if A'Length(1) is not equal to A'Length(2)
{, or if Values'Range is not equal to A'Range(1), or if the}[. The] index
ranges of the parameter Vectors are {not equal to }those of A. ...
{Constraint_Error is also raised in implementation-defined circumstances
if the algorithm used does not converge quickly enough.}
Add after G.3.2(146/2)
AARM NOTE
There is no requirement on the absolute direction of the returned
eigenvectors. Thus they might be multiplied by any complex number whose
modulus is 1. It is only the ratios of the components that matter. This
is standard practice.
Insert after G.3.2(160/2)
An implementation should minimize the circumstances under which the
algorithm used for Eigenvalues and Eigensystem fails to converge.
AARM NOTE
J. H. Wilkinson is the acknowledged expert in this area. See for example
Wilkinson, J. H., and Reinsch, C. , Linear Algebra , vol II of Handbook
for Automatic Computation, Springer-Verlag, or Wilkinson, J. H., The
Algebraic Eigenvalue Problem, Oxford University Press.
!discussion
These notes are based on discussions with the implementor concerned,
staff at the Numerical Algorithms Group at Oxford and the National
Physical Laboratory as well as browsings at the British Library and
various web sites.
It has been suggested that all implementations of these packages will
simply interface to implementations of the BLAS. This is not true.
The implementor who raised these questions did them from scratch. It
should also be pointed out that the BLAS are simply a set of
specifications and one would have to interface to a particular
implementation.
1 The accuracy given for the inner product is met by classical techniques
for the multiplication of matrices but as stated in the question is not
met by fast techniques using subcubic methods such as that devised by
Strassen.
Our first reaction to this question was to propose relaxing the accuracy
requirements perhaps by just saying that there is no requirement for
matrix multiplication (but keeping the requirement for vector
multiplication) or giving alternatives according to whether
componentwise or normwise accuracy is appropriate. Further deliberation
has lead to a different conclusion.
One point is that subcubic methods are considered to be less stable.
Moreover, they are only worthwhile in terms of time for very large
matrices (certainly with size greater than 100 and possibly greater than
1000 depending on the architecture of the hardware). Accordingly, since
Ada's general goal is to provide reliability, if it is desired to use
such methods then an implementation should provide a distinct subprogram
for this perhaps in a child package.
An alternative approach might simply be to say that these packages are
implemented in the relaxed mode if subcubic methods are desired. However,
it is not clear (see G(3) and G(4)) whether it is permissible to
implement part of this annex in the strict mode and part in the relaxed
mode.
Accuracy requirements for Strassen's method are discussed by Higham in
Accuracy and Stability of Numerical Algorithms. They are based on the
overall matrix norm and take the form k*||A||*||B||. However, the vital
constant k is complex and depends not only on the size of the matrices
but also on the depth of recursion involved (below a certain size
conventional multiplication is used). Moreover, although an error bound
can be given it can be a large overestimate (an example of a factor of
1800000 for a matrix of size 1024 is given).
It should also be observed that the implementor who raised the question
used conventional multiplication and indeed an extremely accurate form
of inner product which is guaranteed to be accurate to the last bit of
the representation. See Accurate Floating point Summation by Rump, Ogita
and Oishi, Technical Report 05.1 from the Faculty for Information and
Communication Science, Hamburg University of Technology. Thus the
question raised here is of a philosophical nature.
The question is also phrased somewhat oddly. It suggests that insisting
on component-wise accuracy for inner product prevents interfacing to
an implementation of the BLAS. That is not true. One can implement
inner product the traditional way thereby giving the accuracy specified
and then use an implementation of the BLAS for matters such as the
determination of eigensystems.
A brief AARM note is added on Strassen's method.
2 The RM was incorrectly worded for the subprograms Eigensystem. The
intent was that the out parameter supplied for Values should have the
same range as the result of the function Eigenvalues. And similarly the
out parameter for Vectors should have the same ranges as the matrix A.
It seems more elegant to make the ranges the same rather than just
require that the lengths be the same.
3 It has been said that the (somewhat slow) Jacobi method never fails
and similarly for the QR method with so-called Wilkinson shift (this is
for real symmetric or Hermitian matrices - general matrices can indeed
cause problems). Thus in principle it ought not to be necessary ever to
raise an exception. However, it is an imperfect world and so permission
is granted to raise Constraint_Error if convergence is elusive. Note also
that the NAG Fortran library does provide an exit error condition but
no instance of it ever being used has been recorded.
4 The instructions given were to provide the functionality of 13813 and
so Integer was chosen as the base range. If a shorter range had been
deemed appropriate then we have a difficult debate over whether it should
be Natural or Positive. Positive would seem to be more common but it would
be a nuisance to cause problems for users who wish to use a zero lower
bound. This happens in many integration applications. Thus if we use an
N by N mesh then it is natural to index the N+1 points from 0 to N.
Moreover, some applications might wish to use a range such as -N to +N.
On reflection it might have been wiser to make the index ranges a further
generic parameter. But then we would have had to make a decision for the
preinstantiated versions. So no change is made in this area.
5 As noted, the specification of abs for the Hermitian L2-norm should
return Real'Base and not Complex.
6 The wording "no complex conjugation is performed" was removed because
it seemed obvious. As the questioner notes, having to do any conjugation
explicitly makes it clearer and mimics the usual mathematical notation.
An AARM note is added.
7 A survey of the specification of libraries providing eigensystem
routines show that it is standard practice not to specify the "absolute
direction" of eigenvectors. All that matters is the ratio of the
components. For test purposes and comparing implementations it is thus
only the ratios that should be compared. An AARM note is added.
It is worth considering what alternative approaches might have been taken.
First one could stipulate that for each vector, the element with largest
absolute value should be positive (or in the complex case, both real
and positive). If two elements have the same absolute value then that
with lowest index could be taken. However, in the case where two elements
have equal maximal absolute values but of opposite signs then there is a
risk that different implementations might return quite different results.
Thus suppose one eigenvector (for n=4) is (0.7, -0.7, 0.1, 0.1) and
suppose two different implementations employing different algorithms return
(0.7000001, -0.69999999, ...) and (0.69999999, -0.70000001, ...) respectively.
The algorithm suggested will change the sign of the second result but not
the first. So the results will seem to be widely different. However, if we
just compare the ratios of the components then it will be clear that the
results are essentially the same. Similar difficulties arise with other
approaches such as taking the first non-zero element to be positive. Further
difficulties arise if two eigenvalues are identical since then any two
orthonormal vectors in the 2-space concerned might be returned. It seems
that the unpleasantness mentioned by the implementor is inherent in the
problem.
!ACATS test
These changes should be reflected in the ACATS tests for these packages to the
extent possible.
!appendix
[Summary of private mail of July 2007; used by demand, er, permission.]
Pascal Leroy:
It appears that the eigenvectors returned by Eigensystem in
Generic_Real_Arrays and Generic_Complex_Arrays are not well specified, and
this has unfortunate effects. One problem is that the result of these
subprograms is hard to test. A more serious problem is that different
implementations may return different eigenvectors, and this may cause
portability problems. Also, even in a single implementation, changes to
the internal algorithms, or to the code generated for the annex G
generics, may cause the result of Eigensystem to change. This
non-determinism is unpleasant.
Consider first the real case. If V1, V2, ..., Vn are eigenvectors that
are mutually orthonormal, then changing the signs of any subset of these
vectors also result is a set of eigenvectors that are mutually orthonormal
(this is easily seen geometrically: it amount to changing the direction of
the vectors in an orthonormal basis). It would have been nice to be more
specific, for instance by requiring the first nonzero component of each
vector to be positive.
Consider now the complex case. Things are more nasty here because you
have much more freedom: you can multiply each vector by a complex number
of modulus 1, and still have an orthonormal set. One way to lift the
ambiguity would be to require that the first nonzero component of each
vector be a positive real. But this would require a lot of complex
divisions, which would degrade the accuracy somewhat. Not sure what to do
here.
John Barnes:
Another source of difficulty is if two or more eigenvalues are the same.
This results in a whole subspace of vectors and any orthogonal set in that
subspace will do. I don't see how one could overcome that easily either.
Pascal Leroy:
True, but it's hard to be excited about this. Even if the matrix has two
mathematically identical eigenvalues, it's hard to believe that the
numerical algorithm used will find the floating-point values of the
eigenvalues be exactly identical, so the eigenvectors won't form a whole
subspace after all. If, conversely, it turns out that the matrix didn't
have mathematically identical eigenvalues, but the numerical algorithm
produces two identical floating-point values, then the whole thing is
probably ill-conditioned, so you get what you get.
****************************************************************
From: John Barnes
Sent: Friday, August 3, 2007 2:17 AM
Perhaps you would feel happier about eigenvalues with an example of their
use in nature. They turn up in all sorts of physical situations. Perhaps the
simplest is the moment of inertia of a solid object such as the earth. Now
the earth is a slightly flattened sphere. As a consequence its moment of
inertia about the polar axis is more than the moment of inertia about an
axis going through the equator. Moreover, it will spin smoothly about any of
these axes. But if you attempt to spin it about an arbitrary axis like an
axis through Madison, Paris or London then it will wobble because it is not
symmetric about that axis.
The axes where there is no wobble are known as the principal axes. The
moments of inertia about them are the eigenvalues and the axes themselves
are the eigenvectors.
In the case of an oblate sphere like the earth, one principal axis is the
polar axis and any two at right angles through the equator can be taken as
the other axes. That's because two of the eigenvalues are the same. A
general rigid body will have three distinct eigenvalues and no confusion
about the axes.
The eigenvalues are easy and there is never any dispute about them. The
problem is the eigenvectors because for one thing you can take them in
either direction. From North to South pole or vice versa for example. And if
two or more eigenvalues coincide then the axes can be chosen in lots of
different ways.
These eigensystems turn up in lots of physical situations. One other that
immediately springs to mind is in elasticity. Something like a block of wood
has a different elasticity along the grain and across the grain.
They also turn up in statistics when estimating several parameters.
The complex number stuff turns up in quantum mechanics.
Here endeth the first lesson.
---
Another example I should have mentioned is an aeroplane propeller. If it has
just two blades then the three moments of inertia (three eigenvalues) are
different. I believe this causes vibration when turning. A four bladed
propeller has two eigenvalues the same and no vibration. A three bladed
propeller is OK as well perhaps surprisingly.
---
> I for one cannot think of a practical application of complex eigenvalues in
> finite-dimensional spaces.
Got it - I knew it was something about angular momentum.
The Pauli spin matrices are Hermitian. See for example Penrose, The Road to
Reality page 550-551. The matrices are (using good old Ada aggregates)
L1 = ((0, 1), (1, 0))
L2 = ((0, -i), (i, 0))
L3 = ((1, 0), (0, -1))
OK so L1 and L3 are only real, But L2 is the proper thing. It is about the
simplest possible Hermitian matrix.
The eigenvalues are the possible spin values, which are in fact real
(half-integer, really).
****************************************************************