Solving Hermitian Eigenvalue Problems

LI Jiaze,Wed Mar 05 2025•math

This blog documents the methodology and implementation details to achieve performance (time and memory) comparable to LAPACK. The full code is on GitHub (opens in a new tab)

Method

Let $A$ be a Hermitian (or symmetric) $n \times n$ matrix. A scalar $\lambda$ is called an eigenvalue and a nonzero column vector $z$ the corresponding eigenvector if $Az = \lambda z$ . $\lambda$ is always real in our case since $A$ is Hermitian.

The basic task of the Hermitian eigenvalue problem routines is to compute values of $\lambda$ and, optionally, corresponding vectors $z$ for a given matrix $A$ . Therefore we reformulate the problem as a eigendecomposition $A = Z \Lambda Z^*$ , where $\Lambda$ is a diagonal matrix and $Z$ is a unitary matrix with each column as an eigenvector. This calculation is done by the following steps:

Reduce the Hermitian matrix $A$ into (real) tridiagonal form $T$ i.e. $A = U T U^*$ ;
Solve the eigen decomposition of $T$ i.e. $T = Q \Lambda Q^\top$ . $\Lambda$ contains the eigenvalues of $T$ , which are also the eigenvalues of $A$ , since the first step is a similarity transformation;
(Optional) Compute the eigenvectors of $A$ which are the columns of $Z = U Q$ . (If there is no need to compute eigenvectors, there is no need to store $U$ and $Q$ )

Householder Tridiagonalization

If $A$ is Hermitian, we construct a unitary matrix $U$ s.t.

U^* A U = T

is tridiagonal. Here we use Householder reflector to construct $U$ . Suppose that Householder matrices $H_1, ..., H_{k-1}$ has been determined s.t.

A_{k-1}=\left(H_1 \cdots H_{k-1}\right)^* A\left(H_1 \cdots H_{k-1}\right),

is tridiagonal through its first $k-1$ columns.

A_{k-1} = \begin{bmatrix} B_{11} & B_{12} & 0 \\ B_{21} & B_{22} & B_{23} \\ 0 & B_{32} & B_{33} \end{bmatrix} \quad \begin{array}{c} _{k-1} \\ _{1} \\ _{n-k} \end{array}

If $\tilde{H}_k$ is an $(n-k)$ -order Householder matrix s,t, $\tilde{H}_k^*B_{32} = [\beta, 0, ..., 0]^\top$ where $\beta = ||B_{32}||_2 \in \mathbb{R}$ and if $H_k = \text{diag}(I_k, \tilde{H}_k)$ , then

A_k = H_k^* A_{k-1} H_k = \begin{bmatrix} B_{11} & B_{12} & 0 \\ B_{21} & B_{22} & B_{23} \tilde{H}_k \\ 0 & \tilde{H}_k^* B_{32} & \tilde{H}_k^* B_{33} \tilde{H}_k \end{bmatrix} \quad \begin{array}{c} _{k-1} \\ _1 \\ _{n-k} \end{array}

is tridiagonal through its first $k$ columns. Therefore, if $U = H_1 \cdots H_{n-2}$ , then $U^* A U = T$ is real tridiagonal.

In the calculation of $A_k$ it is important to exploit symmetry during the formation of the matrix $\tilde{H}_k^* B_{33} \tilde{H}_k$ (rank-2 update). To be specific, as a Householder reflection, $\tilde{H}_k$ has the form

\tilde{H}_k=I-\tau u u^*, \quad \tau=2 / u^* u, \quad 0 \neq u \in \mathbb{C}^{n-k} .

Note that if $p=\tau^* B_{33} u$ and $w=p-\left(\tau p^* u / 2\right) u$ , then

\tilde{H}_k^* B_{33} \tilde{H}_k=B_{33}-u w^*-w u^* .

QR Iteration with Shifts and Deflation

Once the tridiagonal matrix $T$ is obtained, we construct a orthogonal matrix $Q$ s.t.

Q^\top T Q = \Lambda

is diagonal. We use QR iteration to construct $Q$ and compute $\Lambda$ , or in other words, with Givens rotation

T'=G_{\text{all}}^\top T G_{\text{all}},\quad G_{\text{all}}=G_{1, 2} \cdots G_{n-1, n}

until it converge to $\Lambda$ .

To ensure convergence, we use the (implicit) Wilkinson shift (2.5.11 Theorem).

To reduce computation, we detect the largest unreduced part before each QR step (deflation). In the program, it's hard to iterate until the sub-diagonal is exactly 0, so we set

h_{i, i-1}=0, \quad \text { where } \quad\left|h_{i, i-1}\right| \leq \text{eps}\left(\left|h_{i, i}\right|+\left|h_{i-1, i-1}\right|\right)

Implementation Details

Computer programming is an art, because it applies accumulated knowledge to the world, because it requires skill and ingenuity, and especially because it produces objects of beauty. — Donald Knuth

This chapter is about the details used in the code that reduce computation and memory usage, and improve numerical stability. For full implementation please refer to the code.

Templates

The entire program is implemented through template classes or functions e.g.

template<typename MatrixType>
class HermitianEigenSolver;

The advantage of using classes is that all memory, including that needed for temporary variables (see Workspace), is requested at the initialization. The advantage of using templates is that the entire eigenproblem solver can be applied to different kinds of dense matrices, real symmetric or Hermitian matrices with both single and double precision. One more advantage is that many functions can take not only arma::Mat as arguments, but also arma::subview and its references, which reduces lots of memory copy.

Workspace

At an instance of HermitianEigenSolver is initialized, we request a Scalar (can be complex) vector of size $2N$ as a workspace

VectorType mWorkspace;
mWorkspace(2 * matrix.n_cols);

In subsequent computations, (almost) all of the temporary variables will be stored on workspace to reuse memory and control the total usage.

Tridiagonalization

When doing tridiagonalization, the outermost function is as follow

template<typename MatrixType, typename RealVectorType, typename VectorType>
void tridiagonalization(MatrixType &A,
                        RealVectorType &diag,
                        RealVectorType &sub_diag,
                        VectorType &workspace,
                        bool withQ);

where the main and sub diagonals are returned by 2 vectors rather than dense arma::Mat. Also $Q$ is returned by A if it needs to be constructed.

As mentioned before, we keep reflect on $B_{32}$ in every step using Householder. Note that Householder has some special structure, $H v = [\beta, 0, ... , 0]^\top$ where $H = I-\tau u u^*$ and $u = [1, ...] ^\top$ . Combining this special structure with the need for temporary variables, the entire tridiagonalization process requires only memory of additional size $N$ . An example step is shown below.

euro

VectorType Householders(workspace.memptr(), n - 1, false, false);

This temporary variable is malloced on workspace, and at step $k$ , its first $k-1$ elements store the previous $\tau$ , while the remainder is used for the temporary variable $w$ (see rank-2 update).

When updating the matrix, we don't care about the result of $B_{23} \tilde{H}_k$ , but only about $\tilde{H}_k^* B_{33} \tilde{H}_k$ .

// w = conj(tau) * A[i+1:, i+1:] * u
Householders.tail(remain) = subA * (scalar_conj(tau) * tail);

Here we first do the scalar product to vector, then matrix multiply. This is because scalar product to matrix is much more computationally cost than scalar product to vector. Writing directly following the formula will increase the computation due to the fact that * is left associative in C++.

arma::Mat<Scalar> uw = tail * Householders.tail(remain).t();
subA -= (uw + uw.t());

For $\tilde{H}_k^* B_{33} \tilde{H}_k=B_{33}-u w^*-w u^*$ , but don't compute $uw^*$ and $wu^*$ twice as they are conjugate transposed. Also here it is possible to update only the upper diagonal of $B_{33}$ in recovering the complete matrix, but I haven't found a good way.

Householder Reflector

For a given vector $v$ , construct a Householder Reflector $H$ s.t. $H v = [\beta, 0, ... , 0]^\top$ , but instead of explicitly storing $H$ , store $\tau$ and $u$ s.t. $H = I-\tau u u^*$

template<typename VectorType, typename Scalar, typename RealScalar>
void make_Householder(VectorType &vector, Scalar &tau, RealScalar &beta);

Here we reuse vector for both input $v$ and output $u$ .

If eigenvectors are required, we should also construct the matrix $Q = H_{n-2} \cdots H_1$ or $Q = H_{n-2} \cdots H_1 I$ , and since there is no explicit $H$ , we need the following function for matrix $M$ s.t. $M = H M$

template<typename MatrixType, typename VectorType, typename Scalar>
void apply_Householder_left(MatrixType &M,
                            const VectorType &v,
                            const Scalar &tau,
                            Scalar *workspace);

Note that the v here is not the full $u$ , but is missing the first element. Still it is a reference from $B_{32}[1:]$ , whose first element should correctly store the sub-diagonal elements.

Givens Rotation

Constructing a Givens rotation s.t. $G^\top (a, b)^\top = (r, 0)^\top$ . The case of complex numbers is not considered here, since the tridiagonal matrix is already real. But it should be able to apply to complex matrix.

template<typename Scalar>
inline void make_givens(const Scalar &a, const Scalar &b,
                        Scalar &c, Scalar &s);

This time, for computing the eigenvectors, we no longer need left-multiply, but instead right-multiply $Q = Q G$

// A = A * G
template<typename Scalar, typename RealScalar>
void apply_givens_right(Scalar *matrix_ptr,
                        Index n_rows,
                        Index p, Index q,
                        const RealScalar &c, const RealScalar &s);

Here the property that the dense matrix of Armadillo are column-major order is utilized. The purpose of using pointers is to allow further less passing of parameters compared to arma::Mat references.

Scaling

Before starting, we scale the matrix to avoid overflow and scale the eigenvalues back when everything is done.

mat = matrix;
RealScalar scale = arma::abs(mat).max();
if (scale == RealScalar(0))
    scale = RealScalar(1);
mat /= scale;
...
mEigenValues *= scale;

Deflation

We perform each QR step only on the largest unreduced block, specifically, pointing out where the QR step starts and ends based on two indices.

for (Index i = start; i < end; ++i) {
    if (std::abs(sub_diag[i]) <= eps * (std::abs(diag[i]) + std::abs(diag[i + 1])))
        sub_diag[i] = RealScalar(0);
}
// find the largest unreduced block
while (end > 0 && (sub_diag[end - 1] == RealScalar(0)))
    end--;
if (end <= 0)
    break;
start = end - 1;
while (start > 0 && (sub_diag[start - 1] != RealScalar(0)))
    start--;

QR Step

A complete QR step is implemented by the following function

template<typename RealScalar, typename Scalar>
static void tri_diag_qr_step(RealScalar *diag,
                             RealScalar *sub_diag,
                             Index start, Index end,
                             Scalar *Q, Index n);

In which the Wilkinson shift is done by

RealScalar td = (diag[end - 1] - diag[end]) * RealScalar(0.5);
RealScalar e = sub_diag[end - 1];
RealScalar mu = diag[end];
if (td == RealScalar(0)) {
    mu -= std::abs(e);
} else if (e != RealScalar(0)) {
    const RealScalar e2 = e * e;
    const RealScalar h = std::hypot(td, e);
    if (e2 == RealScalar(0)) {
        mu -= e / ((td + (td > RealScalar(0) ? h : -h)) / e);
    } else {
        mu -= e2 / (td + (td > RealScalar(0) ? h : -h));
    }
}

Instead of directly using the formula given in 2.5.7 Lemma

\sigma=a_n+d-\operatorname{sign}(d) \sqrt{d^2+b_{n-1}^2}, \quad d=\frac{a_{n-1}-a_n}{2}.

It is because scaling $d^2$ and $b_{n-1}^2$ will not overflow, but they may underflow.