Note: This is a working paper which will be expanded/updated frequently. All suggestions for improvement are welcome. The directory gifi.stat.ucla.edu/sjacobi has a pdf version, the bib file, the R and C code, and the complete Rmd file.

1 Introduction

Suppose \(A_j\) are \(m\) real symmetric matrices of order \(n\). Our problem is to find a square orthonormal \(K\), i.e. a matrix with \(KK'=K'K=I\), such that the matrices \(H_j\mathop{=}\limits^\Delta K'A_jK\) are as diagonal as possible in the least squares sense. Thus we want to minimize the sum of squares of all off-diagonal elements of the \(H_j\).

It is a natural idea to use Jacabi plain rotations, the orthonormal version of cyclic coordinate descent, for this. The algorithm was proposed, possibly first, by De Leeuw and Pruzansky (1978). It was subsequently programmed in FORTRAN for the Applied Statistics algorithms by Clarkson (1988). Numerical analysts arrived later onthe scene, but contributed a number of essential refinements and extensions, such as an algorithm for Hermitian matrices and a proof of quadratic convergence (Bunse-Gerstner, Byers, and Mehrmann (1993), Cardoso and Souloumiac (1996)). There is a version of the algorithm in R (R Core Team (2018)) in De Leeuw (2008), together with a number of related algorithms using Jacobi plane rotations. De Leeuw (2017) has an R/C program (i.e. calling routine in in R, computational routines in C) for \(m=1\), the classical Jacobi algorithm, but using compact triangular storage of symmetric matrices. In this paper we provide an R/C compact storage version of the simultaneous diagonalization algorithm.

2 Algorithm

In the classical Jacobi method we cycle through all off-diagonal elements in a fixed order, minimizing the sum of squares over single-parameter plane rotations each time. After a cycle is completed we test for convergence. If there is still room for improvement we start a new cycle.

For simplicity we show how to find the rotation angle for the leading off-diagonal element, i.e. the one in position \((1,2)\). The other rotations are handled in precisely the same way. Our derivation is basically the same as the one in De Leeuw and Pruzansky (1978), but perhaps more explicit, and with some minor simplifications.

Suppose \(\overline{K}\) is square orthonormal, with a leading \(2\times 2\) square orthonormal submatrix \(K\) and with the identity as its second principal submatrix. Thus \[ \overline{K}\mathop{=}\limits^\Delta\begin{bmatrix}K&0\\0&I\end{bmatrix}. \] Suppose \(\overline{A}\) is a square symmetric matrix, partitioned in the same way as \(K\). Thus \[ \overline{A}\mathop{=}\limits^\Delta\begin{bmatrix}A&B\\B'&C\end{bmatrix}. \] If \(\overline{H}\mathop{=}\limits^\Delta \overline{K}'\overline{A}\overline{K}\) then \[ \overline{H}=\begin{bmatrix}H&G\\G'&F\end{bmatrix}=\begin{bmatrix}K'AK&K'B\\B'K&C\end{bmatrix} \]

The sum of squares of the elements of \(G=K'B\) is the same as the sum of squares of the elements of \(B\). Thus the only change in the sum of squares of the off-diagonal elements is the change from \(a_{12}\) to \(h_{12}\). Note that the optimum \(K\) is clearly not unique, because interchanging the two columns, and/or multiplying them by \(-1\), does not change the square of the off-diagonal element \(h_{12}\).

Now we can write the \(2\times 2\) plane rotation matrix \(K\) as \[ K=\begin{bmatrix}\cos(\theta)&\sin(\theta)\\-\sin(\theta)&\cos(\theta)\end{bmatrix}, \] and \(2\times 2\) symmetric matrix \(A\) as \[ A=\begin{bmatrix} a&b\\b&c \end{bmatrix} \] then \(H=K'AK\) works out to \[\begin{align}\label{E:h11} h_{11}&=a\cos^2(\theta)-2b\sin(\theta)\cos(\theta)+c\sin^2(\theta),\\ h_{12}=h_{21}&=b(\cos^2(\theta)-\sin^2(\theta))+(a-c)\sin(\theta)\cos(\theta),\\ h_{22}&=c\cos^2(\theta)+2b\sin(\theta)\cos(\theta)+a\sin^2(\theta) \end{align}\]

Thus for \(m\) of such matrices \(A_k\) we want to find the optimal plane rotation by minimizing \[\begin{equation}\label{E:sincos} f(\theta)=\sum_{k=1}^m\ (b_k(\cos^2(\theta)-\sin^2(\theta))+(a_k-c_k)\sin(\theta)\cos(\theta))^2 \end{equation}\] over \(\theta\). By the double-angle formulas from trigonometry, \[\begin{equation}\label{E:double} f(\theta)=\sum_{k=1}^m (b_k\cos(2\theta)+d_k\sin(2\theta))^2. \end{equation}\] with \(d_k\mathop{=}\limits^\Delta\frac12(a_k-c_k)\). Let \(u=\cos(2\theta)\) and \(v=\sin(2\theta)\). Define the \(2\times 2\) matrix \(S\) by \[\begin{align*} s_{11}&=\sum_{k=1}^m b_k^2,\\ s_{12}=s_{21}&=\sum_{k=1}^m b_kd_k,\\ s_{22}&=\sum_{k=1}^m d_k^2. \end{align*}\] Then \(f(u,v)=s_{11}u^2+2s_{12}uv+s_{22}v^2\), which we must minimize over \(u^2+v^2=1\). The minimizing \((u,v)\) is a normalized eigenvector corresponding with the smallest eigenvalue of \(S\).

The smallest eigenvalue of \(S\) is \[\begin{equation} \lambda=\frac12\{(s_{11}+s_{22})-\sqrt{(s_{11}-s_{22})^2+4s_{12}^2}\}. \end{equation}\] We first discuss the “regular” case, with \(s_{i2}\not= 0\). The two eigenvalues of \(S\) are different, and \(\lambda<\min(s_{11},s_{22})\). One choice for the corresponding normalized eigenvector has elements \[\begin{align*} u&=\frac{s_{12}}{\sqrt{s_{12}^2+(\lambda-s_{11})^2}},\\ v&=\frac{\lambda-s_{11}}{\sqrt{s_{12}^2+(\lambda-s_{11})^2}}, \end{align*}\] Both \(u\) and \(v\) are non-zero, with \(v<0\). The second choice for the normalized eigenvector has elements \(-u\) and \(-v\), but since it produces the same loss function value we will ignore it from now on.

Ater comouting \(u\) and \(v\) we must still solve \(\cos(2\theta)=u\) and \(\sin(2\theta)=v\) for \(\cos(\theta)\) and \(\sin(\theta)\) to construct the rotation matrices. We have to be careful here, because of the periodicity of the sine and cosine these equations have multiple solutions. In De Leeuw and Pruzansky (1978) we solve \(\tan(2\theta)=v/u\) for \(\theta\), and then compute \(\sin(\theta)\) and \(\cos(\theta)\). Here we follow a slightly different, and seemingly more direct, route.

From the trigonometric half-angle formulas \[\begin{align*} \cos(\theta)&=\frac12\sigma_1\sqrt{2(1 + u)},\\ \sin(\theta)&=\frac12\sigma_2\sqrt{2(1 - u)}, \end{align*}\] where \(\sigma_1\) and \(\sigma_2\) are either \(+1\) or \(-1\). This has \(\cos^2(\theta)-\sin^2(\theta)=u\) and \[ v=2\sin(\theta)\cos(\theta)=\sigma_1\sigma_2\sqrt{1-u^2}=\sigma_1\sigma_2|v|. \] Since \(v<0\), we must choose opposite signs, i.e. \(\sigma_1\sigma_2=-1\). We choose the solution \[ K=\begin{bmatrix}\hfill\frac12\sqrt{2(1 + u)}&\hfill-\frac12\sqrt{2(1 - u)}\\\frac12\sqrt{2(1 - u)}&\hfill\frac12\sqrt{2(1 + u)}\end{bmatrix}. \] In the “degenerate” case in with \(s_{12}=0\) we have \(f(u,v)=s_{11}u^2+s_{22}v^2\). There are three different cases to consider. This is basically because the eigenvector corresponding the smallest eigenvalue is not a continuous function of its matrix argument.

  1. If \(s_{12}=0\) and \(s_{11}>s_{22}\) then \(\lambda=s_{22}\) and \(\cos(2\theta)=0\), which implies \(\cos(\theta)=\pm\frac12\sqrt{2}\) and \(\sin(\theta)=-\cos(\theta)\). The signs can be chosen arbitrarily, our code sets \[ K=\begin{bmatrix}\frac12\sqrt{2}&-\frac12\sqrt{2}\\\frac12\sqrt{2}&\hfill\frac12\sqrt{2}\end{bmatrix}. \]

  2. If \(s_{12}=0\) and \(s_{11}<s_{22}\) then \(\lambda=s_{11}\) and \(\cos(2\theta)=\pm 1\), which implies the two solutions \(\cos(\theta)=\pm 1\) and \(\sin(\theta)=0\), and \(\cos(\theta)=0\) and \(\sin(\theta)=\pm 1\). We can use either solution, with signs chosen arbitrarily, because the rotation is just sign changes and permutations of columns. In our code we take \(\cos(\theta)=1\) and \(\sin(\theta)=0\), which means \(K\) is the identity, and there is no rotation.

  3. If \(s_{12}=0\) and \(s_{11}=s_{22}\) then \(\lambda=s_{11}=s_{22}\) and the eigenvector is an arbitrary normalized vector. This is the only case in which the largest and smallest eigenvalue are equal. Again we choose \(\cos(\theta)=1\) and \(\sin(\theta)=0\), and do not rotate.

3 Illustration

Suppose we have three \(2\times 2\) symmetric matrices

##      [,1] [,2]
## [1,] +1   -1  
## [2,] -1   +1
##      [,1] [,2]
## [1,] +2   +0  
## [2,] +0   +0
##      [,1] [,2]
## [1,] +1   -2  
## [2,] -2   +0

The sum of squares of the off-diagonal elements is 5.

For \(2\times 2\) matrices we only have to compute a single plane rotation, there is no cycling. So let’s perform the necessary calculations.

The matrix \(S\) is

##      [,1]  [,2] 
## [1,] +5.00 -1.00
## [2,] -1.00 +1.25

with smallest eigenvalue 1 and with corresponding eigenvector -0.242535625, -0.9701425001. Thus \(u=\cos(2\theta)\) is -0.242535625.

## [1] 1
## [1] 1

From \(u\) we compute \(\frac12\sqrt{2(1 + u)}\) and \(\frac12\sqrt{2(1 - u)}\), which are respectively 0.6154122094 and 0.788205438. And the rotation matrix is

##      [,1]      [,2]     
## [1,] +0.615412 -0.788205
## [2,] +0.788205 +0.615412

The rotated matrices are

##      [,1]      [,2]     
## [1,] +0.029857 +0.242536
## [2,] +0.242536 +1.970143
##      [,1]      [,2]     
## [1,] +0.757464 -0.970143
## [2,] -0.970143 +1.242536
##      [,1]      [,2]     
## [1,] -1.561553 +0.000000
## [2,] +0.000000 +2.561553

with a sum of squares of off-diagonal elements of 1.

4 Examples

4.1 Three Two by Two Matrices

This is the same example we used as an illustration earlier.

a <- c(1,-1,1,2,0,0,1,-2,0)
h <- sjacobi (a, 2, 3)

After 2 cycles (where the second one merely concludes there is convergence) we find the rotation

##              [,1]          [,2]
## [1,] 0.6154122094 -0.7882054380
## [2,] 0.7882054380  0.6154122094

The loss function as been reduced from 5 to 1. The rotated matrices are

##      [,1]      [,2]     
## [1,] +0.029857 +0.242536
## [2,] +0.242536 +1.970143
##      [,1]      [,2]     
## [1,] +0.757464 -0.970143
## [2,] -0.970143 +1.242536
##      [,1]      [,2]     
## [1,] -1.561553 +0.000000
## [2,] +0.000000 +2.561553

The results are the same as those computed “by hand”.

4.2 One Matrix

The algorithm can obviously also be applied if \(m=1\), in which case it becomes the cyclic version of the Jacobi method. Minimum loss should be zero.

##       [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
##  [1,]   1    2    3    4    5    6    7    8    9   10  
##  [2,]   2   11   12   13   14   15   16   17   18   19  
##  [3,]   3   12   20   21   22   23   24   25   26   27  
##  [4,]   4   13   21   28   29   30   31   32   33   34  
##  [5,]   5   14   22   29   35   36   37   38   39   40  
##  [6,]   6   15   23   30   36   41   42   43   44   45  
##  [7,]   7   16   24   31   37   42   46   47   48   49  
##  [8,]   8   17   25   32   38   43   47   50   51   52  
##  [9,]   9   18   26   33   39   44   48   51   53   54  
## [10,]  10   19   27   34   40   45   49   52   54   55
h <- sjacobi (a, 10, 1)

In 20 cycles we reduce loss from 81390 to 0.0000000782, and we find the rotation

##       [,1]    [,2]    [,3]    [,4]    [,5]    [,6]    [,7]    [,8]   
##  [1,] +0.8213 +0.1966 +0.2084 +0.1446 -0.1921 -0.2167 +0.2657 -0.2100
##  [2,] +0.1469 -0.0265 -0.0006 +0.5774 +0.1926 +0.2692 -0.6107 +0.0158
##  [3,] +0.0455 -0.0312 +0.0045 +0.3534 -0.3593 +0.5442 +0.1448 -0.0067
##  [4,] +0.0372 +0.1594 +0.0060 -0.1495 +0.5647 +0.5170 +0.4808 +0.0004
##  [5,] +0.0522 -0.4035 +0.0089 -0.4443 -0.5193 +0.2881 -0.0984 -0.0447
##  [6,] +0.0825 +0.6824 +0.0155 -0.3980 -0.0446 +0.0179 -0.4241 +0.1604
##  [7,] +0.1209 -0.3820 +0.0128 -0.1658 +0.3816 -0.1796 -0.2180 -0.5604
##  [8,] +0.1207 -0.3314 +0.2594 +0.0617 +0.1448 -0.2690 +0.0704 +0.7341
##  [9,] -0.0467 +0.0729 -0.8120 +0.1998 -0.1024 -0.2726 +0.1830 +0.0130
## [10,] -0.5094 +0.2114 +0.4790 +0.2673 -0.1711 -0.2344 +0.1697 -0.2734
##       [,9]    [,10]  
##  [1,] +0.1418 -0.0614
##  [2,] +0.3752 -0.1442
##  [3,] -0.6162 -0.2146
##  [4,] +0.2440 -0.2721
##  [5,] +0.4146 -0.3173
##  [6,] -0.1938 -0.3513
##  [7,] -0.3621 -0.3759
##  [8,] -0.1138 -0.3929
##  [9,] +0.1045 -0.4049
## [10,] +0.1884 -0.4143

The diagonal of the rotated matrix is

mprint(diag(triangle2matrix(h$afinal)), d = 10, w = 15)
##  [1]   -1.8824366513    1.0699214092    0.1409608688    6.6137980129
##  [5]    1.5323398746   12.1639813624    2.8050481540    0.5991942821
##  [9]    2.1774756326  314.7797170547

which can be compared with the eigenvalues computed by R

eigen(amat)$values
##  [1] 314.7797170547  12.1639813624   6.6137980129   2.8050481734
##  [5]   2.1774756456   1.5323398746   1.0699214091   0.5991942823
##  [9]   0.1409608363  -1.8824366513

Clearly the eigenvalues computed by sjacobi and by eigen are the same, except for order. And so will be the eigenvectors.

4.3 General Case

For our last example we construct 4 commuting matrices of order 4. Minimum loss should be zero, again.

set.seed(12345)
c1 <- crossprod (matrix(rnorm(40), 10, 4))
ee <- eigen(c1)$vectors
c2 <- tcrossprod (ee %*% diag(rnorm(4)), ee)
c3 <- tcrossprod (ee %*% diag(rnorm(4)), ee)
c4 <- tcrossprod (ee %*% diag(rnorm(4)), ee)
a <- c(c1[outer(1:4,1:4,">=")],c2[outer(1:4,1:4,">=")],c3[outer(1:4,1:4,">=")],c4[outer(1:4,1:4,">=")])
h <- sjacobi (a, 4, 4)

After 8 iterations we have reduced loss from 332.8265349291 to 0.0000000045. Close enough to zero.

5 Code

5.1 R code

5.1.1 sjacobi.R

dyn.load("sjacobi.so")

sjacobi <-
  function (a,
            n,
            m,
            eps = 1e-15,
            itmax = 100,
            vectors = TRUE,
            verbose = FALSE) {
    h <- .C(
      "sjacobi",
      n = as.integer (n),
      m = as.integer (m),
      a = as.double (a),
      k = as.double (rep(0, n * n)),
      fstart = as.double (0),
      ffinal = as.double (0),
      itel = as.integer (0),
      work1 = as.double (rep(0, n)),
      work2 = as.double (rep(0, n)),
      itmax = as.integer(itmax),
      eps = as.double(eps),
      verbose = as.integer(verbose),
      vectors = as.integer(vectors)
    )
    return(list(
      astart = a,
      afinal = h$a,
      k = h$k,
      fstart = h$fstart,
      ffinal = h$ffinal,
      itel = h$itel
    ))
  }

triangle2matrix <- function (x) {
  m <- length (x)
  n <- round ((sqrt (1 + 8 * m) - 1) / 2)
  h <-
    .C("trimat", as.integer (n), as.double (x), as.double (rep (0, n * n)))
  return (matrix(h[[3]], n, n))
}

matrix2triangle <- function (x) {
  n <- dim(x)[1]
  m <- n * (n + 1) / 2
  h <-
    .C("mattri", as.integer (n), as.double (x), as.double (rep (0, m)))
  return (h[[3]])
}

trianglePrint <- function (x,
                           width = 6,
                           precision = 4) {
  m <- length (x)
  n <- round ((sqrt (1 + 8 * m) - 1) / 2)
  h <-
    .C("pritru",
       as.integer(n),
       as.integer(width),
       as.integer(precision),
       as.double (x))
}

matrixPrint <- function (x,
                         width = 6,
                         precision = 4) {
  n <- nrow (x)
  m <- ncol (x)
  h <-
    .C(
      "primat",
      as.integer(n),
      as.integer(m),
      as.integer(width),
      as.integer(precision),
      as.double (x)
    )
}

5.2 C Code

5.2.1 sjacobi.h

#ifndef SJACOBI_H
#define SJACOBI_H

#include <math.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>

static inline int VINDEX(const int);
static inline int MINDEX(const int, const int, const int);
static inline int SINDEX(const int, const int, const int);
static inline int TINDEX(const int, const int, const int);
static inline int AINDEX(const int, const int, const int, const int, const int);
static inline int UINDEX(const int, const int, const int, const int, const int);

static inline double SQUARE(const double);
static inline double THIRD(const double);
static inline double FOURTH(const double);
static inline double FIFTH(const double);

static inline double MAX(const double, const double);
static inline double MIN(const double, const double);
static inline int IMIN(const int, const int);
static inline int IMAX(const int, const int);

static inline int VINDEX(const int i) { return i - 1; }

static inline int MINDEX(const int i, const int j, const int n) {
  return (i - 1) + (j - 1) * n;
}

static inline int AINDEX(const int i, const int j, const int k, const int n,
                         const int m) {
  return (i - 1) + (j - 1) * n + (k - 1) * n * m;
}

static inline int SINDEX(const int i, const int j, const int n) {
  return ((j - 1) * n) - (j * (j - 1) / 2) + (i - j) - 1;
}

static inline int TINDEX(const int i, const int j, const int n) {
  return ((j - 1) * n) - ((j - 1) * (j - 2) / 2) + (i - (j - 1)) - 1;
}

static inline int UINDEX(const int i, const int j, const int k, const int n,
                         const int m) {
  return ((k - 1) * n * (n + 1) / 2) + ((j - 1) * n) - ((j - 1) * (j - 2) / 2) +
         (i - (j - 1)) - 1;
}

static inline double SQUARE(const double x) { return x * x; }
static inline double THIRD(const double x) { return x * x * x; }
static inline double FOURTH(const double x) { return x * x * x * x; }
static inline double FIFTH(const double x) { return x * x * x * x * x; }

static inline double MAX(const double x, const double y) {
  return (x > y) ? x : y;
}

static inline double MIN(const double x, const double y) {
  return (x < y) ? x : y;
}

static inline int IMAX(const int x, const int y) { return (x > y) ? x : y; }

static inline int IMIN(const int x, const int y) { return (x < y) ? x : y; }

void sjacobi(const int *, int *, double *, double *, double *, double *, int *,
             double *, double *, int *, double *, bool *, bool *);

void primat(const int *, const int *, const int *, const int *, const double *);
void pritru(const int *, const int *, const int *, const double *);
void prisru(const int *, const int *, const int *, const int *, const double *);
void trimat(const int *, const double *, double *);
void mattri(const int *, const double *, double *);

#endif /* SJACOBI_H */

5.2.2 sjacobi.c

#include "sjacobi.h"

/*
int main(void) {
  int n = 4, m = 4, it = 0, itmax = 100, w = 10, p = 6;
  bool verbose = true, vectors = true;
  double a[40], evec[16], oldi[4], oldj[4], fini = 0, f = 0, eps = 1e-15;
  for (int i = 1; i <= 40; i++)
    a[VINDEX(i)] = (double)i;
  (void)sjacobi(&n, &m, a, evec, &fini, &f, &it, oldi, oldj, &itmax, &eps,
                &verbose, &vectors);
  (void)primat(&n, &n, &w, &p, evec);
  (void)prisru(&n, &m, &w, &p, a);
}
*/

/*
int main(void) {
  int n = 2, m = 3, it = 0, itmax = 100;
  bool verbose = true, vectors = true;
  double a[9] = {1.0, -1.0, 1.0, 2.0, 0.0, 0.0, 1.0, -2.0, 0.0};
  double evec[4] = {0.0, 0.0, 0.0, 0.0};
  double oldi[2] = {0.0, 0.0}, oldj[2] = {0.0, 0.0}, fini = 0, f = 0,
         eps = 1e-15;
  (void)sjacobi(&n, &m, a, evec, &fini, &f, &it, oldi, oldj, &itmax, &eps,
                &verbose, &vectors);
}
*/

/*
int main(void) {
  int n = 10, m = 1, it = 0, itmax = 100, w = 10, p = 6;
  bool verbose = true, vectors = true;
  double a[55], oldi[10], evec[100], oldj[10], fini = 0, f = 0, eps = 1e-16;
  for (int i = 1; i <= 55; i++) {
    a[VINDEX(i)] = (double)i;
  }
  (void)sjacobi(&n, &m, a, evec, &fini, &f, &it, oldi, oldj, &itmax, &eps,
                &verbose, &vectors);
  (void)pritru(&n, &w, &p, a);
  (void)primat(&n, &n, &w, &p, evec);
}
*/

void sjacobi(const int *nn, int *mm, double *a, double *evec, double *fini,
             double *f, int *it, double *oldi, double *oldj, int *itmax,
             double *eps, bool *verbose, bool *vectors) {
  int n = *nn, m = *mm, itel = 1;
  double d = 0.0, cost = 0.0, sint = 0.0, u = 0.0, p = 0.0, q = 0.0, r = 0.0, piil = 0.0, pijl = 0.0, lbd = 0.0, dd = 0.0,
         pp = 0.0, dp = 0.0;
  double fold = 0.0, fnew = 0.0;
  if (*vectors) {
    for (int i = 1; i <= n; i++) {
      for (int j = 1; j <= n; j++) {
        evec[MINDEX(i, j, n)] = (i == j) ? 1.0 : 0.0;
      }
    }
  }
  for (int k = 1; k <= m; k++) {
    for (int j = 1; j <= n - 1; j++) {
      for (int i = j + 1; i <= n; i++) {
        fold += SQUARE(a[UINDEX(j, i, k, n, m)]);
      }
    }
  }
  *fini = fold;
  while (true) {
    for (int i = 1; i <= n - 1; i++) {
      for (int j = i + 1; j <= n; j++) {
        dd = 0.0, pp = 0.0, dp = 0.0;
        for (int k = 1; k <= m; k++) {
          p = a[UINDEX(j, i, k, n, m)];
          q = a[UINDEX(i, i, k, n, m)];
          r = a[UINDEX(j, j, k, n, m)];
          d = (q - r) / 2.0;
          dd += d * d;
          pp += p * p;
          dp += p * d;
        }
        if ((fabs(dp) < 1e-15) && (pp <= dd)) {
          continue;
        } else {
          lbd = ((dd + pp) - sqrt(SQUARE(dd - pp) + 4.0 * SQUARE(dp))) / 2.0;
          u = dp / sqrt(SQUARE(dp) + SQUARE(pp - lbd));
        }
        cost = sqrt((1 + u) / 2);
        sint = -sqrt((1 - u) / 2);
        if (*vectors) {
          for (int l = 1; l <= n; l++) {
            piil = evec[MINDEX(l, i, n)];
            pijl = evec[MINDEX(l, j, n)];
            evec[MINDEX(l, i, n)] = cost * piil - sint * pijl;
            evec[MINDEX(l, j, n)] = sint * piil + cost * pijl;
          }
        }
        for (int k = 1; k <= m; k++) {
          for (int l = 1; l <= n; l++) {
            int il = IMIN(i, l);
            int li = IMAX(i, l);
            int jl = IMIN(j, l);
            int lj = IMAX(j, l);
            oldi[VINDEX(l)] = a[UINDEX(li, il, k, n, m)];
            oldj[VINDEX(l)] = a[UINDEX(lj, jl, k, n, m)];
          }
          for (int l = 1; l <= n; l++) {
            if (l == i)
              continue;
            if (l == j)
              continue;
            int il = IMIN(i, l);
            int li = IMAX(i, l);
            int jl = IMIN(j, l);
            int lj = IMAX(j, l);
            a[UINDEX(li, il, k, n, m)] =
                cost * oldi[VINDEX(l)] - sint * oldj[VINDEX(l)];
            a[UINDEX(lj, jl, k, n, m)] =
                sint * oldi[VINDEX(l)] + cost * oldj[VINDEX(l)];
          }
        }
        for (int k = 1; k <= m; k++) {
          int ij = IMIN(i, j);
          int ji = IMAX(j, i);
          p = a[UINDEX(ji, ij, k, n, m)];
          q = a[UINDEX(i, i, k, n, m)];
          r = a[UINDEX(j, j, k, n, m)];
          a[UINDEX(i, i, k, n, m)] =
              SQUARE(cost) * q + SQUARE(sint) * r - 2 * cost * sint * p;
          a[UINDEX(j, j, k, n, m)] =
              SQUARE(sint) * q + SQUARE(cost) * r + 2 * cost * sint * p;
          a[UINDEX(ji, ij, k, n, m)] =
              cost * sint * (q - r) + (SQUARE(cost) - SQUARE(sint)) * p;
        }
      }
    }
    fnew = 0.0;
    for (int k = 1; k <= m; k++) {
      for (int i = 1; i <= n - 1; i++) {
        for (int j = i + 1; j <= n; j++) {
          fnew += SQUARE(a[UINDEX(j, i, k, n, m)]);
        }
      }
    }
    if (*verbose == true) {
      printf("itel %4d fold %10.6f fnew %10.6f\n", itel, fold, fnew);
    }
    if (((fold - fnew) < *eps) || (itel == *itmax))
      break;
    fold = fnew;
    itel++;
  }
  *f = fnew;
  *it = itel;
  return;
}

void primat(const int *n, const int *m, const int *w, const int *p,
            const double *x) {
  for (int i = 1; i <= *n; i++) {
    for (int j = 1; j <= *m; j++) {
      printf(" %*.*f ", *w, *p, x[MINDEX(i, j, *n)]);
    }
    printf("\n");
  }
  printf("\n\n");
  return;
}

void pritru(const int *n, const int *w, const int *p, const double *x) {
  for (int i = 1; i <= *n; i++) {
    for (int j = 1; j <= i; j++) {
      printf(" %*.*f ", *w, *p, x[TINDEX(i, j, *n)]);
    }
    printf("\n");
  }
  printf("\n\n");
  return;
}

void prisru(const int *n, const int *m, const int *w, const int *p,
            const double *x) {
  for (int k = 1; k <= *m; k++) {
    for (int i = 1; i <= *n; i++) {
      for (int j = 1; j <= i; j++) {
        printf(" %*.*f ", *w, *p, x[UINDEX(i, j, k, *n, *m)]);
      }
      printf("\n");
    }
    printf("\n\n");
  }
  return;
}

void trimat(const int *n, const double *x, double *y) {
  int nn = *n;
  for (int i = 1; i <= nn; i++) {
    for (int j = 1; j <= nn; j++) {
      y[MINDEX(i, j, nn)] =
          (i >= j) ? x[TINDEX(i, j, nn)] : x[TINDEX(j, i, nn)];
    }
  }
  return;
}

void mattri(const int *n, const double *x, double *y) {
  int nn = *n;
  for (int j = 1; j <= nn; j++) {
    for (int i = j; i <= nn; i++) {
      y[TINDEX(i, j, nn)] = x[MINDEX(i, j, nn)];
    }
  }
  return;
}

References

Bunse-Gerstner, A., R. Byers, and V. Mehrmann. 1993. “Numerical Methods for Simultaneous Diagonalization.” SIAM Journal Matrix Analysis and Applications 14 (4): 927–49.

Cardoso, J.-F., and A. Souloumiac. 1996. “Jacobi Anges for Simultaneous Diagonalization.” SIAM Journal Matrix Analysis and Applications 17 (1): 161–64.

Clarkson, D.B. 1988. “Remark AS R74: A Least Squares Version of Algorithm AS 211: The F-G Diagonalization Algorithm.” Applied Statistics 37 (2): 317–21.

De Leeuw, J. 2008. “Using Jacobi Plane Rotations in R.” Preprint Series 556. Los Angeles, CA: UCLA Department of Statistics. http://www.stat.ucla.edu/~deleeuw/janspubs/2008/reports/deleeuw_R_08a.pdf.

———. 2017. “Jacobi Eigen in R/C with Lower Triangular Column-wise Compact Storage.” 2017. https://doi.org/10.13140/RG.2.2.33245.72163.

De Leeuw, J., and S. Pruzansky. 1978. “A New Computational Method to Fit the Weighted Euclidean Distance Model.” Psychometrika 43: 479–90. http://www.stat.ucla.edu/~deleeuw/janspubs/1978/articles/deleeuw_pruzansky_A_78.pdf.

R Core Team. 2018. R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. {https://www.R-project.org/}.