Note: This is a working paper which will be expanded/updated frequently. The directory gifi.stat.ucla.edu/croissants has a pdf copy of this article, the complete Rmd file with all code chunks, and R and C files with the code. Unfortunately the Norway data cannot be shared, so that part of the paper is not reproducible. If you want to knit the Rmd file, remove the Norway section.
In multiple correspondence analysis (MCA) we frequently observe two-dimensional scatter plots in which the points are on or close to a convex function, with shape similar to a parabola. This is often referred to as the horseshoe effect, but that name is not quite appropriate, since horseshoes fold back at the endpoints. Thus we suggest to call such plots croissants, which seems especially suitable if there is considerable dispersion around the convex function.
Croissants have a complicated history. In data analysis they were first described in the context of scale analysis by Guttman (1950). The MCA eigenvectors of perfect scales satisfy a second order difference equation, whose solution are the discrete orthogonal polynomials. The first and second component define the croissant. In the French data analysis literature the horseshoe or croissant is consequently called the Effet Guttman.
The next major contribution was Lancaster (1958), who described a family of bivariate distributions for which the singular value decomposition of the density consisted of orthogonal polynomials. The most famous member of the family is the bivariate normal, which has the Hermite-Chebyshev polynomials as its components. Lancaster’s work was generalized and extended by his students and co-workers. It was taken up by Benzécri’s school, in particular in the thesis of Naouri (1970).
In ecology horseshoes were first discussed in the influential paper of Hill (1974). They were considered a nuisance, and various techniues were developed to get rid of them (Hill and Gauch 1980). Ecologists work with the Gaussian abundance model for environmental gradients, and Ihm and van Groenewoud (1975) showed this resulted in horseshoes.
In multidimensional scaling quadratic structures were first discussed and analyzed by Levelt, Van De Geer, and Plomp (1966) for musical intervals (a parabola) and by De Gruijter (1967) for political parties (an ellipse). More information on horseshoes in the MDS context is in Diaconis, Goel, and Holmes (2008) and De Leeuw (2007).
The application of Hermite-Chebyshev polynomials to the multivariate normal case in MCA was outlined in section 3.8 of De Leeuw (1974). Gifi (1980) pointed out for the first time that oscillation matrices and total positivity were the relevant mathematical theories, see also section 9.2 of Gifi (1990). Total positivity was used in a general theory by Schriever (1983), Schriever (1985) that covers all previously discussed special cases. The appropriate way to patch the decomposition of the various bivariate marginals together in the multivariate case is due to De Leeuw (1982), see also Bekker and De Leeuw (1988). Much of the work of the Gifi school on horseshoes was summarized in Van Rijckevorsel (1987). #MCA
We give a very brief introduction to MCA to fix the terminology. We start with n measurements on m categorical variables. Variable j has \(k_j\) categories. We then expand each variable to an indicator matrix \(G_j\), indicating category membership. Thus \(G_j\) is an \(n\times k_j\) binary matrix with exactly one element equal to one in each column, and zeroes everywhere else. Define \(G:=\begin{bmatrix}G_1\mid\cdots\mid G_m\end{bmatrix}\) and \(C=G'G\). The matrix \(C\), which contains the bivariate cross tables of the variables, is called the Burt Matrix (Tableau de Burt). Also define \(D:=\mathbf{diag}(C)\), the diagonal matrix of univariate marginals. Both \(C\) and \(D\) are of order \(K:=\sum k_j\).
MCA is defined as solving the generalized eigenvalue problem \(CY=mDY\Lambda\), with \(Y'DY=I\). The generalized eigenvectors \(Y\) are called the category quantifications, and the centroids \(X:=\frac{1}{m}GY\) are called the object scores. In most cases we do not use all eigenvectors in the data analysis but only a small number of them. The category quantifications are in an \(\sum k_j\times p\) matrix, and the object scores are in an \(n\times p\) matrix. Note that with the normalization we have chosen \(0\leq\Lambda\leq I\) and \(X'X=\frac{1}{m}\Lambda\).
In our computations we use the package geigen (Hasselman and Lapack authors 2015). Or, alternatively, we define \(E:=D^{-\frac12}CD^{-\frac12}\), compute eigenvalues \(Z\) such that \(EZ=mZ\Lambda\), and then set \(Y=D^{-\frac12}Z\).
Also, because of the singularities in \(G\), there is one generalized eigenvalue equal to one (with a corresponding generalized eigenvector \(y\) that has all elements equal) and there are \(m-1\) generalized eigenvalues equal to zero.
One simple way to reliably generate croissants is to discreticize a sample from a multinormal. Of course there are various parameters to consider. The function discreteNormal() allows one to choose the size of the sample n, the number of variables m, the correlation between the variables r, and the discretization points (aka knots).
formals ("discreteNormal")
## $n
##
##
## $m
##
##
## $r
##
##
## $knots
c(0,1,2,3), keeping the correlaton between the four variables at .75. A truncated versions of the croissant appears.
Finally we can choose a correlation matrix which does not have all correlations the same. We use the correlation matrix
## [,1] [,2] [,3] [,4]
## [1,] 1.000 -0.027 0.729 0.008
## [2,] -0.027 1.000 0.001 0.343
## [3,] 0.729 0.001 1.000 -0.027
## [4,] 0.008 0.343 -0.027 1.000
This has the effect of perturbing the croissant, because it becomes a mixture of different croissants for the different variables.
In real data we have a combination of all these effects. We will have low correlations and/or skewness for some of the variables. Thus croissants can be asymmetric and have a lot of scatter. And, of course, there may be deviations from underlying normality as well, which can easily destroy the croissant.
The GALO example has served us well for almost 40 years. The objects (individuals) are 1290 school children in the sixth grade of elementary school in the city of Groningen (Netherlands) in 1959. There are four variables
One of the standard ways to get rid of MCA croissants is to impose constraints on the category quantifications. In his canonical MCA paper Guttman (1941) already argued that only the first dimension of an MCA should be used in linear multivariate analysis, because the remaining dimensions are components of non-linearity. The MCA does not decompose variance, as PCA does in classical multivariate analysis, but it decomposes chi-square, and thus its interpretation should be completely different.
There are various ways to incorporate this insight. The simplest one is to use the first MCA dimension to quantify the variables, and to use these quantified variables in a standard linear technique such as PCA or regression. The combination of MCA quantification and PCA is sometimes called PRIMALS (Meulman and Gifi 1981).
The correlation matrix of the quantified variables is
## SEX IQ ADV SES
## SEX 1.0000000 0.2250425 0.1530414 0.2624973
## IQ 0.2250425 1.0000000 0.7838499 0.3591835
## ADV 0.1530414 0.7838499 1.0000000 0.3809945
## SES 0.2624973 0.3591835 0.3809945 1.0000000
with eigenvalues (divided by 4)
## [1] 0.53915911 0.23737936 0.17062083 0.05284071
and eigenvectors
## [,1] [,2] [,3] [,4]
## [1,] -0.2968564 0.8447797 -0.4381085 0.07927491
## [2,] -0.5990117 -0.2807262 -0.2625353 -0.70246210
## [3,] -0.5930407 -0.3551810 -0.1553433 0.70570362
## [4,] -0.4487360 0.2852833 0.8455794 -0.04738025
Using the first two eigenvectors we can make a PCA plot.
To show how this analysis is related to MCA, we first normalize the category quantifications of variable \(j\) to \(y_j'D_jy_j=1\). We then create the first column of \(Y\) by multiplying the \(y_j\) by the corresponding entries of the first eigenvector of the correlation matrix. And do the same for the second column of \(Y\). We still have \(Y'DY=I\), while \(\mathbf{tr}\ Y'CY\) is equal to the sum of the first two eigenvalues of the correlation matrix, i.e. 0.7765385. For the MCA the corresponding sum is 0.9306767.
The PRIMALS approach is a two-step approach, in the sense that we first maximize by choosing the largest eigenvalue of the Burt table and then we maximize by choosing the largest eigenvalues of the induced correlation matrix. Thus there are two different criteria involved, and this could be seen as a disadvantage. In the PRINCALS approach we choose quantifications of the variables that maximize the sum of the first p eigenvalues of the induced correlation matrix. For \(p=1\) this is PRIMALS, but for \(p>1\) the two approaches are different.
We perform the necessary computations, in the GALO case for \(p=2\), using the homals package (De Leeuw and Mair 2009). Alternatively, the aspect package (Mair and De Leeuw 2010) could be used.
The correlation matrix of the quantified variables is
## SEX IQ ADV SES
## SEX 1.00000000 0.01951188 -0.07525007 0.3791437
## IQ 0.01951188 1.00000000 0.58402262 0.4182533
## ADV -0.07525007 0.58402262 1.00000000 0.3333477
## SES 0.37914374 0.41825334 0.33334774 1.0000000
with eigenvalues (divided by 4)
## [1] 0.4825477 0.3013908 0.1146095 0.1014520
and eigenvectors
## [,1] [,2] [,3] [,4]
## [1,] -0.1899217 0.8085833 0.5568847 -0.001470354
## [2,] -0.5980856 -0.2443347 0.1488185 -0.748630276
## [3,] -0.5556153 -0.3817234 0.3664577 0.641317128
## [4,] -0.5454494 0.3752077 -0.7303706 0.168115701
The largest eigenvalue is now 0.4825477, while for PRIMALS it was the larger value 0.5391591. But the sum of the two largest eigenvalues for PRINCALS is 0.7839385, while for PRIMALS it is the smaller value 0.7765385.
Using the first two eigenvectors of the induced correlation matrix we can make a PCA plot.A more complete and more revealing analysis is possible by using the theory in De Leeuw (1982), see also Bekker and De Leeuw (1988) and De Leeuw (2015). We study an alternative way of diagonalizing the matrix \(E=D^{-\frac12}CD^{-\frac12}\).
Suppose we can find square orthonormal \(K_j\) such that \(K_j'K_j=I\) and \(K_jE_{j\ell}K_\ell\) is diagonal for all \(j,\ell=1,\cdots,m\), say equal to the diagonal matrices \(\Gamma_{j\ell}\). The condition is that if two submatrices of \(E\) have a subscript in common, then then also have the corresponding set of singular vectors in common. De Leeuw (1988) calls this simultaneous bilinearizability. Since that name is a bit verbose, we call the condition SBL. We know SBL is possible for the continuous multinormal, with the \(K_j\) the Hermite-Chebyshev polynomials. It is also possible if all variables are binary, and it is possible if \(m=2\). Those are some important gauges, and we expect SBL to be true approximately in many practical situations.
Define the direct sum \(K:=K_1\oplus\cdots\oplus K_m\), and \(\Gamma:=K'EK\). Under SBL all \(\Gamma_{j\ell}\) are diagonal. We can then find a permutation \(P\) of rows and columns such that \(R:=P'\Gamma P\) is the direct sum of a number of correlation matrices. If we start with \(m\) variables with \(k\) categories each, then \(R=R_1\oplus\cdots\oplus R_k\), where each \(R_s\) is of order \(m\). The eigenvectors of \(R\) are the direct sum \(L:=L_1\oplus\cdots\oplus L_k\), and under SBL we have \[ L'RL=L'P'\Gamma PL=L'P'K'EKPL=m\Lambda. \] Thus under SBL the eigenvectors of \(E\) are given by \(Z=KPL\) and the eigenvalues of \(E\) are also the eigenvalues of \(\Gamma\) and also the eigenvalues of \(R\). Thus for each eigenvalue of \(E\) we can say which is the \(R_s\) it belongs to.
If the variables have a different number of categories then under SBL we have \(k\) matrices \(R_s\), where \(k\) is now the maximum of the \(k_j\). The \(R_s\) are generally of different orders, because variable \(j\) is represented in \(R_s\) if and only if \(k_j\geq s\). Thus a binary variable only occurs in \(R_1\) and \(R_2\). In all cases the matrix \(R_1\) can be chosen to be the \(m\times m\) matrix with all elements equal to one. For all variables binary the matrix \(R_2\) is the matrix of phi-coefficients (point correlations).
Now SBL will not be precisely be satisfied for empirical data, except in the trivial cases \(k_j\equiv 2\) or \(m=2\). To create a suitable dat analysis techniue we choose the \(K_j\) in such a way that the sum of squares of the off-diagonal elements of the \(\Gamma_{j\ell}\) is minimized, or, equivalently, the sum of squares of the diagonal elements is maximized. The details and the code are in De Leeuw (2008). For this paper the approximate diagonalization procedure is in the function threeStepApprox() which uses kplSVD() to find the \(K_j\) and kplPerm() to find the permutation \(P\).
Because the SBL theory is somewhat more complicated than other MCA theory, we analyze a simple normal example in detail.
set.seed (12345)
x <- discreteNormal(1000, 4, matrix (.75, 4, 4) + .25 * diag (4), repList (c(-2,2), 4))
b <- burtTable (x)
d <- 1 / sqrt (diag (b$c))
e <- b$c * outer (d, d)
h <- threeStepApprox (e, rep(3, 4))
The Burt Table is
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12]
## [1,] 25 0 0 9 16 0 9 16 0 9 16 0
## [2,] 0 943 0 18 913 12 14 920 9 15 915 13
## [3,] 0 0 32 0 21 11 0 20 12 0 27 5
## [4,] 9 18 0 27 0 0 8 19 0 12 15 0
## [5,] 16 913 21 0 950 0 15 920 15 12 929 9
## [6,] 0 12 11 0 0 23 0 17 6 0 14 9
## [7,] 9 14 0 8 15 0 23 0 0 9 14 0
## [8,] 16 920 20 19 920 17 0 956 0 15 927 14
## [9,] 0 9 12 0 15 6 0 0 21 0 17 4
## [10,] 9 15 0 12 12 0 9 15 0 24 0 0
## [11,] 16 915 27 15 929 14 14 927 17 0 958 0
## [12,] 0 13 5 0 9 9 0 14 4 0 0 18
The normalized Burt Table \(E\) is
## 1.0000 0.0000 0.0000 0.3464 0.1038 0.0000 0.3753 0.1035 0.0000 0.3674 0.1034 0.0000
## 0.0000 1.0000 0.0000 0.1128 0.9646 0.0815 0.0951 0.9690 0.0640 0.0997 0.9627 0.0998
## 0.0000 0.0000 1.0000 0.0000 0.1204 0.4055 0.0000 0.1143 0.4629 0.0000 0.1542 0.2083
## 0.3464 0.1128 0.0000 1.0000 0.0000 0.0000 0.3210 0.1183 0.0000 0.4714 0.0933 0.0000
## 0.1038 0.9646 0.1204 0.0000 1.0000 0.0000 0.1015 0.9654 0.1062 0.0795 0.9738 0.0688
## 0.0000 0.0815 0.4055 0.0000 0.0000 1.0000 0.0000 0.1146 0.2730 0.0000 0.0943 0.4423
## 0.3753 0.0951 0.0000 0.3210 0.1015 0.0000 1.0000 0.0000 0.0000 0.3831 0.0943 0.0000
## 0.1035 0.9690 0.1143 0.1183 0.9654 0.1146 0.0000 1.0000 0.0000 0.0990 0.9687 0.1067
## 0.0000 0.0640 0.4629 0.0000 0.1062 0.2730 0.0000 0.0000 1.0000 0.0000 0.1199 0.2057
## 0.3674 0.0997 0.0000 0.4714 0.0795 0.0000 0.3831 0.0990 0.0000 1.0000 0.0000 0.0000
## 0.1034 0.9627 0.1542 0.0933 0.9738 0.0943 0.0943 0.9687 0.1199 0.0000 1.0000 0.0000
## 0.0000 0.0998 0.2083 0.0000 0.0688 0.4423 0.0000 0.1067 0.2057 0.0000 0.0000 1.0000
and its eigenvalues are
## 4.0000 2.1028 1.9429 0.9545 0.7356 0.6493 0.6118 0.5338 0.4695 -0.0000 -0.0000 -0.0000
After application of kplSVD() to \(E\) we have
## 1.0000 0.0000 0.0000 1.0000 0.0000 0.0000 1.0000 -0.0000 -0.0000 1.0000 0.0000 -0.0000
## 0.0000 1.0000 0.0000 -0.0000 0.3313 -0.0209 -0.0000 0.3622 -0.0212 -0.0000 0.3529 -0.0049
## 0.0000 0.0000 1.0000 0.0000 -0.0226 0.3858 0.0000 -0.0218 0.4459 0.0000 -0.0080 0.1868
## 1.0000 -0.0000 0.0000 1.0000 -0.0000 -0.0000 1.0000 -0.0000 -0.0000 1.0000 0.0000 -0.0000
## 0.0000 0.3313 -0.0226 -0.0000 1.0000 -0.0000 0.0000 0.3054 -0.0141 0.0000 0.4590 -0.0102
## 0.0000 -0.0209 0.3858 -0.0000 -0.0000 1.0000 0.0000 -0.0136 0.2541 0.0000 -0.0117 0.4289
## 1.0000 -0.0000 0.0000 1.0000 0.0000 0.0000 1.0000 0.0000 -0.0000 1.0000 0.0000 -0.0000
## -0.0000 0.3622 -0.0218 -0.0000 0.3054 -0.0136 0.0000 1.0000 0.0000 -0.0000 0.3692 -0.0024
## -0.0000 -0.0212 0.4459 -0.0000 -0.0141 0.2541 -0.0000 0.0000 1.0000 -0.0000 -0.0042 0.1883
## 1.0000 -0.0000 0.0000 1.0000 0.0000 0.0000 1.0000 -0.0000 -0.0000 1.0000 0.0000 -0.0000
## 0.0000 0.3529 -0.0080 0.0000 0.4590 -0.0117 0.0000 0.3692 -0.0042 -0.0000 1.0000 0.0000
## -0.0000 -0.0049 0.1868 -0.0000 -0.0102 0.4289 -0.0000 -0.0024 0.1883 -0.0000 -0.0000 1.0000
Permuting rows and columns gives the matrix \(R=R_1\oplus R_2\oplus R_3\)
## 1.0000 1.0000 1.0000 1.0000 0.0000 0.0000 -0.0000 0.0000 0.0000 0.0000 -0.0000 -0.0000
## 1.0000 1.0000 1.0000 1.0000 -0.0000 -0.0000 -0.0000 0.0000 0.0000 -0.0000 -0.0000 -0.0000
## 1.0000 1.0000 1.0000 1.0000 -0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 -0.0000 -0.0000
## 1.0000 1.0000 1.0000 1.0000 -0.0000 0.0000 -0.0000 0.0000 0.0000 0.0000 -0.0000 -0.0000
## 0.0000 -0.0000 -0.0000 -0.0000 1.0000 0.3313 0.3622 0.3529 0.0000 -0.0209 -0.0212 -0.0049
## 0.0000 -0.0000 0.0000 0.0000 0.3313 1.0000 0.3054 0.4590 -0.0226 -0.0000 -0.0141 -0.0102
## -0.0000 -0.0000 0.0000 -0.0000 0.3622 0.3054 1.0000 0.3692 -0.0218 -0.0136 0.0000 -0.0024
## 0.0000 0.0000 0.0000 -0.0000 0.3529 0.4590 0.3692 1.0000 -0.0080 -0.0117 -0.0042 0.0000
## 0.0000 0.0000 0.0000 0.0000 0.0000 -0.0226 -0.0218 -0.0080 1.0000 0.3858 0.4459 0.1868
## 0.0000 -0.0000 0.0000 0.0000 -0.0209 -0.0000 -0.0136 -0.0117 0.3858 1.0000 0.2541 0.4289
## -0.0000 -0.0000 -0.0000 -0.0000 -0.0212 -0.0141 0.0000 -0.0042 0.4459 0.2541 1.0000 0.1883
## -0.0000 -0.0000 -0.0000 -0.0000 -0.0049 -0.0102 -0.0024 -0.0000 0.1868 0.4289 0.1883 1.0000
And we can now diagonalize the blocks.
## 4.0000 -0.0000 0.0000 -0.0000 -0.0000 -0.0000 0.0000 -0.0000 -0.0000 0.0000 0.0000 -0.0000
## -0.0000 -0.0000 0.0000 0.0000 -0.0000 -0.0000 0.0000 -0.0000 -0.0000 0.0000 -0.0000 -0.0000
## 0.0000 0.0000 0.0000 -0.0000 -0.0000 0.0000 -0.0000 -0.0000 -0.0000 0.0000 -0.0000 -0.0000
## -0.0000 -0.0000 0.0000 -0.0000 0.0000 0.0000 0.0000 -0.0000 -0.0000 0.0000 -0.0000 -0.0000
## -0.0000 -0.0000 -0.0000 0.0000 2.0923 -0.0000 0.0000 -0.0000 -0.0395 -0.0077 -0.0066 -0.0015
## -0.0000 -0.0000 0.0000 0.0000 -0.0000 0.7349 -0.0000 0.0000 -0.0037 0.0047 -0.0058 -0.0110
## 0.0000 0.0000 -0.0000 0.0000 0.0000 -0.0000 0.6412 -0.0000 0.0045 -0.0007 -0.0144 0.0158
## -0.0000 -0.0000 -0.0000 -0.0000 -0.0000 0.0000 -0.0000 0.5315 -0.0083 -0.0058 0.0038 0.0126
## -0.0000 -0.0000 -0.0000 -0.0000 -0.0395 -0.0037 0.0045 -0.0083 1.9532 -0.0000 -0.0000 -0.0000
## 0.0000 0.0000 0.0000 0.0000 -0.0077 0.0047 -0.0007 -0.0058 -0.0000 0.9543 0.0000 0.0000
## 0.0000 -0.0000 -0.0000 -0.0000 -0.0066 -0.0058 -0.0144 0.0038 -0.0000 0.0000 0.6185 0.0000
## -0.0000 -0.0000 -0.0000 -0.0000 -0.0015 -0.0110 0.0158 0.0126 -0.0000 0.0000 0.0000 0.4740
and we can sort the diagonal elements
## 4.0000 2.0923 1.9532 0.9543 0.7349 0.6412 0.6185 0.5315 0.4740 0.0000 -0.0000 -0.0000
to find they are really close to the eigenvalues of \(E\). The correlations between the eigenvector of \(E\) and the SBL approximations are
## -1.0000 0.0000 -0.0000 0.0000 0.0000 -0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 -0.0000
## -0.0000 0.0000 0.0000 -0.0000 -0.9669 -0.0007 0.0007 -0.0014 0.2551 0.0065 0.0043 0.0009
## -0.0000 0.0000 0.0000 -0.0000 -0.2551 0.0029 -0.0033 0.0057 -0.9669 0.0020 0.0013 0.0003
## 0.0000 -0.0000 -0.0000 -0.0000 0.0068 0.0215 -0.0023 -0.0138 0.0002 0.9996 -0.0006 -0.0009
## -0.0000 -0.0000 -0.0000 -0.0000 -0.0003 0.9976 0.0009 -0.0030 0.0030 -0.0216 -0.0500 -0.0420
## -0.0000 -0.0000 -0.0000 -0.0000 0.0019 0.0182 -0.9045 0.0044 0.0032 -0.0022 0.4180 -0.0826
## 0.0000 0.0000 0.0000 0.0000 0.0041 0.0472 0.4154 0.0503 -0.0008 0.0011 0.9057 0.0486
## 0.0000 -0.0000 -0.0000 0.0000 0.0002 0.0091 -0.0356 0.9786 0.0059 0.0134 -0.0490 0.1957
## 0.0000 -0.0000 -0.0000 -0.0000 -0.0009 -0.0404 0.0901 0.1987 0.0007 0.0029 0.0020 -0.9751
## -0.0000 -0.4411 -0.4836 0.7560 0.0000 -0.0000 0.0000 -0.0000 -0.0000 -0.0000 -0.0000 0.0000
## 0.0000 0.0000 0.8424 0.5389 0.0000 -0.0000 -0.0000 0.0000 -0.0000 0.0000 -0.0000 -0.0000
## -0.0000 -0.8975 0.2377 -0.3716 0.0000 -0.0000 0.0000 0.0000 -0.0000 0.0000 -0.0000 0.0000
Thus the eight non-trivial eigenvectors of \(E\) correspond, repectively, with first of \(R_2\), first of \(R_3\), second of \(R_3\), second of \(R_2\), third of \(R_2\), third of \(R_3\), fourth of \(R_2\), fourth of \(R_3\). In terms of the polynomials underlying the category quantifications this means linear, quadratic, quadratic, linear, linear, quadratic, linear, quadratic.
Back to GALO. For GALO the eigenvalues of \(E/4\) are
## 1.0000 0.5392 0.3915 0.3833 0.3465 0.3083 0.2764 0.2702 0.2569 0.2517 0.2454 0.2414 0.2379 0.2282 0.2183 0.1902 0.1706 0.1519 0.1283 0.1150 0.0490 -0.0000 -0.0000 -0.0000
The direct sum of the eigenvalues of the \(R_s/m\), which have orders 4, 4, 3, 3, 3, 3, 2, 1, 1, is
## 1.0000 0.0000 0.0000 -0.0000
## 0.5376 0.2471 0.1636 0.0517
## 0.3908 0.2425 0.1167
## 0.3505 0.2442 0.1553
## 0.3238 0.2527 0.1735
## 0.2688 0.2606 0.2206
## 0.2716 0.2284
## 0.2500
## 0.2500
and if we sort these we find
## 1.0000 0.5376 0.3908 0.3505 0.3238 0.2716 0.2688 0.2606 0.2527 0.2500 0.2500 0.2471 0.2442 0.2425 0.2284 0.2206 0.1735 0.1636 0.1553 0.1167 0.0517 0.0000 0.0000 -0.0000
Note that the orders of the matrices \(R_s\) in this case are 4, 4, 3, 3, 3, 3, 2, 1, 1.
Clearly the approximate eigenvalues from the KPL diagonalization and the actual eigenvalues are very similar, especially the larger ones and the smaller ones. The correlation between actual eigenvectors and approximate eigenvectors is \(Q:=Z'KPL\). Let us look at the first three non-trivial eigenvectors of \(E\), which correspond with rows 2, 3, and 4 of \(Q\). Regress them on the non-trivial columns of \(KPL\), which are all columns except the first four. The percentage of variance explained by the colums of \(KPL\) are the squares of the elements of rows 2, 3, and 4 of \(Q\).
## 0.99 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
## 0.00 0.00 0.00 0.00 0.98 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
## 0.00 0.31 0.01 0.00 0.01 0.01 0.00 0.20 0.07 0.00 0.26 0.02 0.00 0.01 0.08 0.00 0.00 0.00 0.00 0.01
Thus we see the first non-trivial eigenvector of \(E\) corresponds with the first eigenvector of \(R_2\), which the next eigenvector of \(E\) corresponds with the first eigenvector of \(R_3\). The third non-trivial eigenvector of \(E\) is most closely related to the second eigenvector of \(R_2\), but there are also contributions of the first eigenvectors of \(R_4\) and \(R_5\).
There is a somewhat more rigorous method to match the eigenvectors in \(Z\) and \(KPL\). The Hadamard product \(Q*Q\) is doubly stochastic. Finding the permutation matrix closest in the least suares sense to \(Q*Q\) is a linear assignment problem that can be solved by using lp.assign() from the lpSolve package (Berkelaar and others 2015). We indicate for each row the column where the permutation matrix has a one.
## [1] 1 5 9 6 12 15 21 18 23 24 13 16 10 22 20 7 17 14 19 11 8 2 3
## [24] 4
Again this indicates that the first three non-trivial eigenvectors of \(E\) correspond with the dominant eigenvalue of \(R_2\), the dominant eigenvalue of \(R_3\), and the second eigenvalue of \(R_2\). If the eigenvectors in \(K\) correspond with the orthogonal polynomials, as they do in the multinormal case, then the first two nontrivial eigenvectors produce a croissant, because they come from the linear and quadratic transformations, respectively. But if we select the first and the third non-trivial eigenvector of \(E\) to plot, then they both come from the linear quantifications and they do not produce a croissant.
We first plot \(Y=D^{-\frac12}KPL\), using the columns corresponding to the first two non-trivial eigenvectors of \(E\). This shows the croissant.The data is register data from Norway, with information on a high number (n = 159539) of individuals. The analysis is part of a social mobility study conducted by Arild Danielsen, Department of History, Sociology and Innovation, Faculty of Economics and Social Sciences, University College of Southeast Norway. See Danielsen (2015). The data have four variables with data on class-fractions (based on occupational categories) of the individual’s two parents (in 2008) and two grand-fathers (in 1980).
Thus we have information for each of the 159539 individuals about the membership of Mother, Father, Father’s Father, and Mother’s Father in the following categories.
In order to simplify our analysis we decided to eliminate all individuals with scores in categories 12-14. This reduces the number of observations to 81259. For the actual sociological analysis this would probably be a wasteful and unwise decision, but for our croissant illustration it makes sense. The marginals of the remaining eleven categories are as follows.
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
## [1,] 716 3290 1117 2081 4902 6618 1308 3229 4799 18730 34469
## [2,] 650 3451 1072 1996 4987 6873 1384 3265 5030 18868 33683
## [3,] 1373 3839 3710 4198 11458 11078 860 4534 8630 17506 14073
## [4,] 639 1069 284 6466 10570 2587 3090 14977 7225 17579 16773
Using additivity does not really get rid of the wedge, but allows us to study it in more detail and take it apart. SBL theory gives us another tool to look at the wedge.
For Norway the eigenvalues of \(E/4\) are
## 1.0000 0.4487 0.3095 0.2912 0.2783 0.2745 0.2620 0.2607 0.2594 0.2585 0.2569 0.2561 0.2550 0.2537 0.2527 0.2517 0.2512 0.2501 0.2498 0.2484 0.2481 0.2477 0.2470 0.2468 0.2457 0.2449 0.2446 0.2426 0.2425 0.2414 0.2387 0.2380 0.2365 0.2339 0.2304 0.2288 0.2198 0.2134 0.1984 0.1845 0.1580 0.0000 -0.0000 -0.0000
The direct sum of the eigenvalues of the \(R_s/m\) is
## 1.0000 0.0000 -0.0000 0.0000
## 0.4486 0.2054 0.1869 0.1591
## 0.3081 0.2430 0.2407 0.2082
## 0.2909 0.2470 0.2413 0.2208
## 0.2742 0.2479 0.2452 0.2328
## 0.2766 0.2497 0.2438 0.2299
## 0.2597 0.2530 0.2482 0.2391
## 0.2577 0.2522 0.2471 0.2430
## 0.2576 0.2500 0.2466 0.2457
## 0.2577 0.2510 0.2488 0.2425
## 0.2551 0.2534 0.2497 0.2418
Note that each row here adds up to one. Also note that the orders of the matrices \(R_s\) in this case are 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, because all variables have 11 categories. If we sort the eigenvalues of all \(R_s/m\) we find
## 1.0000 0.4486 0.3081 0.2909 0.2766 0.2742 0.2597 0.2577 0.2577 0.2576 0.2551 0.2534 0.2530 0.2522 0.2510 0.2500 0.2497 0.2497 0.2488 0.2482 0.2479 0.2471 0.2470 0.2466 0.2457 0.2452 0.2438 0.2430 0.2430 0.2425 0.2418 0.2413 0.2407 0.2391 0.2328 0.2299 0.2208 0.2082 0.2054 0.1869 0.1591 0.0000 0.0000 -0.0000
The two sets of sorted eigenvalues are plotted in Figure 27. It is pretty obvious they are close.
The assignment problem for the eigenvectors gives the solution
## [1] 1 5 9 13 21 17 25 33 29 26 37 42 30 22 41 38 43 31 27 34 19 39 18
## [24] 35 36 14 40 32 10 23 15 28 11 44 20 24 16 12 6 7 8 2 3 4
Again we see the familiar pattern: eigenvalues of \(E\) are the first eigenvalue of (trivial) \(R_1\), followed by the first eigenvalue of \(R_2\), of \(R_3\), and of \(R_4\). The remaining eigenvalue of \(R_2\) are actually the smallest ones of the whole sequence, except for the trivial zeroes. This tells us that these data are highly one-dimensional. We see the familiar wedge again in Figure 28, which plots the dominant eigenvectors corresponding to \(R_2\) and \(R_3\).
discreteNormal <- function(n, m, r, knots) {
x <- matrix(rnorm(n * m), n, m)
x <- x %*% chol(r)
for (j in 1:m) {
x[, j] <- apply(outer(x[, j], c(knots[[j]], Inf), ">"), 1, which.min)
}
return(x)
}
repList <- function(x, m) {
z <- vector("list", m)
return(lapply(z, function(i) x))
}
burtTable <- function(x, degrees = rep(-1, ncol(x)), knots = NULL, center = FALSE,
standardize = FALSE, orthonormalize = FALSE) {
n <- nrow(x)
m <- ncol(x)
g <- matrix(0, n, 0)
l <- rep(0, m)
for (j in 1:m) {
z <- x[, j]
if (degrees[j] < 0) {
h <- ifelse(outer(z, unique(z), "=="), 1, 0)
} else {
h <- bsplineBasis(z, degrees[j], knots[[j]])
}
if (center) {
h <- center(h)[, -1]
}
if (standardize) {
h <- standardize(h)
}
if (orthonormalize) {
h <- gs(h)$q
}
g <- cbind(g, h)
l[j] <- ncol(h)
}
return(list(c = crossprod(g), g = g, ord = l))
}
directSum <- function(x) {
m <- length(x)
nr <- sum(sapply(x, nrow))
nc <- sum(sapply(x, ncol))
z <- matrix(0, nr, nc)
kr <- 0
kc <- 0
for (i in 1:m) {
ir <- nrow(x[[i]])
ic <- ncol(x[[i]])
z[kr + (1:ir), kc + (1:ic)] <- x[[i]]
kr <- kr + ir
kc <- kc + ic
}
return(z)
}
kplPerm <- function(cc, k) {
kl <- unlist(sapply(k, function(i) 1:i))
p <- ifelse(outer(1:sum(k), order(kl), "=="), 1, 0)
return(list(pcp = t(p) %*% cc %*% p, perm = p, ord = as.vector(table(kl))))
}
makeE <- function(cc, k) {
dd <- mInvSqrt(blockSelect(cc, k))
return(dd %*% cc %*% dd)
}
mInvSqrt <- function(x) {
ex <- eigen(x)
ew <- abs(ex$values)
ev <- ifelse(ew == 0, 0, 1/sqrt(ew))
ey <- ex$vectors
return(ey %*% (ev * t(ey)))
}
blockSelect <- function(cc, k) {
l <- unlist(lapply(1:length(k), function(i) rep(i, k[i])))
return(cc * ifelse(outer(l, l, "=="), 1, 0))
}
kplSVD <- function(e, k, eps = 1e-06, itmax = 500, verbose = TRUE, vectors = TRUE) {
m <- length(k)
sk <- sum(k)
ll <- kk <- ww <- diag(sk)
itel <- 1
ossq <- 0
klw <- 1 + cumsum(c(0, k))[1:m]
kup <- cumsum(k)
ind <- lapply(1:m, function(i) klw[i]:kup[i])
for (i in 1:m) kk[ind[[i]], ind[[i]]] <- t(svd(e[ind[[i]], ])$u)
kek <- kk %*% e %*% t(kk)
for (i in 1:m) for (j in 1:m) ww[ind[[i]], ind[[j]]] <- ifelse(outer(1:k[i],
1:k[j], "=="), 1, 0)
repeat {
for (l in 1:m) {
if (k[l] == 2)
(next)()
li <- ind[[l]]
for (i in (klw[l] + 1):(kup[l] - 1)) for (j in (i + 1):kup[l]) {
bi <- kek[i, -li]
bj <- kek[j, -li]
wi <- ww[i, -li]
wj <- ww[j, -li]
acc <- sum(wi * bi^2) + sum(wj * bj^2)
acs <- sum((wi - wj) * bi * bj)
ass <- sum(wi * bj^2) + sum(wj * bi^2)
u <- eigen(matrix(c(acc, acs, acs, ass), 2, 2))$vectors[, 1]
c <- u[1]
s <- u[2]
kek[-li, i] <- kek[i, -li] <- c * bi + s * bj
kek[-li, j] <- kek[j, -li] <- c * bj - s * bi
if (vectors) {
ki <- kk[i, li]
kj <- kk[j, li]
kk[i, li] <- c * ki + s * kj
kk[j, li] <- c * kj - s * ki
}
}
}
nssq <- sum(ww * kek^2) - sum(diag(kek)^2)
if (verbose)
cat("Iteration ", formatC(itel, digits = 4), "ssq ", formatC(nssq,
digits = 10, width = 15), "\n")
if (((nssq - ossq) < eps) || (itel == itmax))
(break)()
itel <- itel + 1
ossq <- nssq
}
return(list(kek = kek, kk = kk, itel = itel, ssq = nssq))
}
inducedR <- function(c, y, k) {
m <- length(k)
l <- unlist(lapply(1:m, function(i) rep(i, k[i])))
g <- ifelse(outer(l, 1:m, "=="), 1, 0)
s <- g * matrix(y, length(y), m)
r <- crossprod(s, c %*% s)
e <- abs(diag(r))
d <- ifelse(e == 0, 0, e)
return(r/sqrt(outer(d, d)))
}
threeStepApprox <- function(e, k, eps = 1e-06, itmax = 500, verbose = FALSE,
vectors = TRUE) {
ee <- eigen(e)
ev <- ee$vectors
ea <- ee$values
f <- kplSVD(e, k, eps = eps, itmax = itmax, verbose = verbose, vectors = TRUE)
ef <- f$kek
kf <- f$kk
g <- kplPerm(ef, k)
eg <- g$pcp
kg <- crossprod(g$perm, kf)
gg <- g$ord
gl <- cumsum(gg)
lg <- 1 + c(0, gl)[-length(gl) - 1]
bb <- lapply(1:length(gl), function(i) eigen(eg[lg[i]:gl[i], lg[i]:gl[i]])$vectors)
vc <- directSum(bb)
eh <- crossprod(vc, eg %*% vc)
kh <- crossprod(kg, vc)
cc <- crossprod(ev, kh)
mc <- apply(cc, 1, function(x) max(abs(x)))
wc <- apply(cc, 1, function(x) which.max(abs(x)))
yc <- apply(cc, 2, function(x) max(abs(x)))
vc <- apply(cc, 2, function(x) which.max(abs(x)))
return(list(eval = ea, evec = ev, aval = diag(eh), avec = kh, cc = cc, mc = mc,
wc = wc, yc = yc, vc = vc, gg = gg))
}
0.01 12/04/15
0.02 12/06/15
0.03 12/07/15
1.00 12/08/15
1.01 12/09/15
1.02 12/10/15
1.03 12/12/15
1.04 12/13/15
2.00 12/14/15
2.01 12/16/15
Bekker, P., and J. De Leeuw. 1988. “Relation Between Variants of Nonlinear Principal Component Analysis.” In Component and Correspondence Analysis, edited by J.L.A. Van Rijckevorsel and J. De Leeuw, 1–31. Wiley Series in Probability and Mathematcal Statistics. Chichester, England: Wiley. http://www.stat.ucla.edu/~deleeuw/janspubs/1988/chapters/bekker_deleeuw_C_88.pdf.
Berkelaar, M., and others. 2015. lpSolve: Interface to ’Lpsolve’ v. 5.5 to Solve Linear/Integer Programs. https://CRAN.R-project.org/package=lpSolve.
Danielsen, A. 2015. “An Accumulation Approach to Cultural Capital.” Paper presented at the conference Changing elites in Europe LSE, London 26th-27th November 2015.
De Gruijter, D.N.M. 1967. The Cognitive Structure of Dutch Political Parties in 1966. Report E019-67. Psychological Institute, University of Leiden.
De Leeuw, J. 1974. Canonical Analysis of Categorical Data. Leiden, The Netherlands: Psychological Institute, Leiden University. http://www.stat.ucla.edu/~deleeuw/janspubs/1974/books/deleeuw_B_74.pdf.
———. 1982. “Nonlinear Principal Component Analysis.” In COMPSTAT 1982, edited by H. Caussinus, P. Ettinger, and R. Tomassone, 77–86. Vienna, Austria: Physika Verlag. http://www.stat.ucla.edu/~deleeuw/janspubs/1982/chapters/deleeuw_C_82.pdf.
———. 1988. “Multivariate Analysis with Linearizable Regressions.” Psychometrika 53: 437–54. http://www.stat.ucla.edu/~deleeuw/janspubs/1988/articles/deleeuw_A_88a.pdf.
———. 2007. A Horseshoe for Multidimensional Scaling. Preprint Series 530. Los Angeles, CA: UCLA Department of Statistics. http://www.stat.ucla.edu/~deleeuw/janspubs/2007/reports/deleeuw_R_07a.pdf.
———. 2008. Using Jacobi Plane Rotations in R. Preprint Series 556. Los Angeles, CA: UCLA Department of Statistics. http://www.stat.ucla.edu/~deleeuw/janspubs/2008/reports/deleeuw_R_08a.pdf.
———. 2015. “Multiset Canonical Correlation Analysis.” http://rpubs.com/deleeuw/126462.
De Leeuw, J., and P. Mair. 2009. “Homogeneity Analysis in R: the Package homals.” Journal of Statistical Software 31 (4): 1–21. http://www.stat.ucla.edu/~deleeuw/janspubs/2009/articles/deleeuw_mair_A_09a.pdf.
Diaconis, P., S. Goel, and S. Holmes. 2008. “Horseshoes in Multidimensional Scaling and Kernel Methods.” Annals of Applied Statistics 2: 777–807.
Gifi, A. 1980. Niet-Lineaire Multivariate Analyse [Nonlinear Multivariate Analysis]. Leiden, The Netherlands: Department of Data Theory FSW/RUL. http://www.stat.ucla.edu/~deleeuw/janspubs/1980/books/gifi_B_80.pdf.
———. 1990. Nonlinear Multivariate Analysis. New York, N.Y.: Wiley.
Guttman, L. 1941. “The Quantification of a Class of Attributes: A Theory and Method of Scale Construction.” In The Prediction of Personal Adjustment, edited by P. Horst, 321–48. New York: Social Science Research Council.
———. 1950. “The Principal Components of Scale Analysis.” In Measurement and Prediction, edited by S.A. Stouffer and Others. Princeton: Princeton University Press.
Hasselman, B., and Lapack authors. 2015. geigen: Calculate Generalized Eigenvalues, the Generalized Schur Decomposition and the Generalized Singular Value Decomposition of a Matrix Pair with Lapack. https://CRAN.R-project.org/package=geigen.
Hill, M.O. 1974. “Correspondence Analysis: a Neglected Multivariate Method.” Applied Statistics 23: 340–54.
Hill, M.O., and H.G. Gauch. 1980. “Detrended Correspondence Analysis: An Improved Ordination Technique.” Vegetatio 42: 47–58.
Ihm, P., and H. van Groenewoud. 1975. “A Multivariate Ordering of Vegetation Data Based on Gaussian Type Gradient Response Curves.” The Journal of Ecology 63: 767–77.
Lancaster, H.O. 1958. “The Structure of Bivariate Distributions.” Annals of Mathematical Statistics 29: 719–36.
Levelt, W.J.M., J.P. Van De Geer, and R. Plomp. 1966. “Triadic Comparions of Musical Intervals.” British Journal of Mathematical and Statistical Psychology 19: 163–79.
Mair, P., and J. De Leeuw. 2010. “A General Framework for Multivariate Analysis with Optimal Scaling: The R Package aspect.” Journal of Statistical Software 32 (9): 1–23. http://www.stat.ucla.edu/~deleeuw/janspubs/2010/articles/mair_deleeuw_A_10.pdf.
Meulman, J.J., and A. Gifi. 1981. One-dimensional Homogeneity Analysis, Including Principal Components Analysis with Optimal Scaling. Department of Data Theory, Leiden University.
Naouri, J.C. 1970. “Analyse Factorielle des Correspondences Continues.” Publications de L’Institute de Statistique de L’Université de Paris 19: 1–100.
Schriever, B.F. 1983. “Scaling of Order-dependent Categorical Variables with Correspondence Analysis.” International Statistical Review 51: 225–38.
———. 1985. “Order Dependence.” PhD thesis, University of Amsterdam, The Netherlands.
Van Rijckevorsel, J.L.A. 1987. “The Application of Fuzzy Coding and Horseshoes in Multiple Correspondence Analysis.” PhD thesis, University of Leiden, The Netherlands.