library(ggplot2)
library(GGally)
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2
library(CCA)
## Loading required package: fda
## Loading required package: splines
## Loading required package: fds
## Loading required package: rainbow
## Loading required package: MASS
## Loading required package: pcaPP
## Loading required package: RCurl
## Loading required package: deSolve
##
## Attaching package: 'fda'
## The following object is masked from 'package:graphics':
##
## matplot
## Loading required package: fields
## Loading required package: spam
## Spam version 2.8-0 (2022-01-05) is loaded.
## Type 'help( Spam)' or 'demo( spam)' for a short introduction
## and overview of this package.
## Help for individual functions is also obtained by adding the
## suffix '.spam' to the function name, e.g. 'help( chol.spam)'.
##
## Attaching package: 'spam'
## The following objects are masked from 'package:base':
##
## backsolve, forwardsolve
## Loading required package: viridis
## Loading required package: viridisLite
##
## Try help(fields) to get started.
DATASET
mm <- read.csv("https://stats.idre.ucla.edu/stat/data/mmreg.csv")
colnames(mm) <- c("Control", "Concept", "Motivation", "Read", "Write", "Math",
"Science", "Sex")
summary(mm)
## Control Concept Motivation Read
## Min. :-2.23000 Min. :-2.620000 Min. :0.0000 Min. :28.3
## 1st Qu.:-0.37250 1st Qu.:-0.300000 1st Qu.:0.3300 1st Qu.:44.2
## Median : 0.21000 Median : 0.030000 Median :0.6700 Median :52.1
## Mean : 0.09653 Mean : 0.004917 Mean :0.6608 Mean :51.9
## 3rd Qu.: 0.51000 3rd Qu.: 0.440000 3rd Qu.:1.0000 3rd Qu.:60.1
## Max. : 1.36000 Max. : 1.190000 Max. :1.0000 Max. :76.0
## Write Math Science Sex
## Min. :25.50 Min. :31.80 Min. :26.00 Min. :0.000
## 1st Qu.:44.30 1st Qu.:44.50 1st Qu.:44.40 1st Qu.:0.000
## Median :54.10 Median :51.30 Median :52.60 Median :1.000
## Mean :52.38 Mean :51.85 Mean :51.76 Mean :0.545
## 3rd Qu.:59.90 3rd Qu.:58.38 3rd Qu.:58.65 3rd Qu.:1.000
## Max. :67.10 Max. :75.50 Max. :74.20 Max. :1.000
Canonical correlation analysis
Below we use the canon command to conduct a canonical correlation analysis. It requires two sets of variables enclosed with a pair of parentheses.
xtabs(~Sex, data = mm)
## Sex
## 0 1
## 273 327
psych <- mm[, 1:3]
acad <- mm[, 4:8]
ggpairs(psych)
ggpairs(acad)
Next, we'll look at the correlations within and between the two sets of variables using the matcor function from the CCA package.
# correlations
matcor(psych, acad)
## $Xcor
## Control Concept Motivation
## Control 1.0000000 0.1711878 0.2451323
## Concept 0.1711878 1.0000000 0.2885707
## Motivation 0.2451323 0.2885707 1.0000000
##
## $Ycor
## Read Write Math Science Sex
## Read 1.00000000 0.6285909 0.6792757 0.6906929 -0.04174278
## Write 0.62859089 1.0000000 0.6326664 0.5691498 0.24433183
## Math 0.67927568 0.6326664 1.0000000 0.6495261 -0.04821830
## Science 0.69069291 0.5691498 0.6495261 1.0000000 -0.13818587
## Sex -0.04174278 0.2443318 -0.0482183 -0.1381859 1.00000000
##
## $XYcor
## Control Concept Motivation Read Write Math
## Control 1.0000000 0.17118778 0.24513227 0.37356505 0.35887684 0.3372690
## Concept 0.1711878 1.00000000 0.28857075 0.06065584 0.01944856 0.0535977
## Motivation 0.2451323 0.28857075 1.00000000 0.21060992 0.25424818 0.1950135
## Read 0.3735650 0.06065584 0.21060992 1.00000000 0.62859089 0.6792757
## Write 0.3588768 0.01944856 0.25424818 0.62859089 1.00000000 0.6326664
## Math 0.3372690 0.05359770 0.19501347 0.67927568 0.63266640 1.0000000
## Science 0.3246269 0.06982633 0.11566948 0.69069291 0.56914983 0.6495261
## Sex 0.1134108 -0.12595132 0.09810277 -0.04174278 0.24433183 -0.0482183
## Science Sex
## Control 0.32462694 0.11341075
## Concept 0.06982633 -0.12595132
## Motivation 0.11566948 0.09810277
## Read 0.69069291 -0.04174278
## Write 0.56914983 0.24433183
## Math 0.64952612 -0.04821830
## Science 1.00000000 -0.13818587
## Sex -0.13818587 1.00000000
R Canonical Correlation Analysis
Due to the length of the output, we will be making comments in several places along the way.
cc1 <- cc(psych, acad)
# display the canonical correlations
cc1$cor
## [1] 0.4640861 0.1675092 0.1039911
# raw canonical coefficients
cc1[3:4]
## $xcoef
## [,1] [,2] [,3]
## Control -1.2538339 -0.6214776 -0.6616896
## Concept 0.3513499 -1.1876866 0.8267210
## Motivation -1.2624204 2.0272641 2.0002283
##
## $ycoef
## [,1] [,2] [,3]
## Read -0.044620600 -0.004910024 0.021380576
## Write -0.035877112 0.042071478 0.091307329
## Math -0.023417185 0.004229478 0.009398182
## Science -0.005025152 -0.085162184 -0.109835014
## Sex -0.632119234 1.084642326 -1.794647036
Next, we'll use (comput) to compute the loadings of the variables on the canonical dimensions (variates). These loadings are correlations between variables and the canonical variates.
# compute canonical loadings
cc2 <- comput(psych, acad, cc1)
# display canonical loadings
cc2[3:6]
## $corr.X.xscores
## [,1] [,2] [,3]
## Control -0.90404631 -0.3896883 -0.1756227
## Concept -0.02084327 -0.7087386 0.7051632
## Motivation -0.56715106 0.3508882 0.7451289
##
## $corr.Y.xscores
## [,1] [,2] [,3]
## Read -0.3900402 -0.06010654 0.01407661
## Write -0.4067914 0.01086075 0.02647207
## Math -0.3545378 -0.04990916 0.01536585
## Science -0.3055607 -0.11336980 -0.02395489
## Sex -0.1689796 0.12645737 -0.05650916
##
## $corr.X.yscores
## [,1] [,2] [,3]
## Control -0.419555307 -0.06527635 -0.01826320
## Concept -0.009673069 -0.11872021 0.07333073
## Motivation -0.263206910 0.05877699 0.07748681
##
## $corr.Y.yscores
## [,1] [,2] [,3]
## Read -0.8404480 -0.35882541 0.1353635
## Write -0.8765429 0.06483674 0.2545608
## Math -0.7639483 -0.29794884 0.1477611
## Science -0.6584139 -0.67679761 -0.2303551
## Sex -0.3641127 0.75492811 -0.5434036
The above correlations are between observed variables and canonical variables which are known as the canonical loadings. These canonical variates are actually a type of latent variable.
In general, the number of canonical dimensions is equal to the number of variables in the smaller set; however, the number of significant dimensions may be even smaller. Canonical dimensions, also known as canonical variates, are latent variables that are analogous to factors obtained in factor analysis. For this particular model there are three canonical dimensions of which only the first two are statistically significant. For statistical test we use R package "CCP".
library(CCP)
# tests of canonical dimensions
rho <- cc1$cor
## Define number of observations, number of variables in first set, and number of variables in the second set.
n <- dim(psych)[1]
p <- length(psych)
q <- length(acad)
## Calculate p-values using the F-approximations of different test statistics:
p.asym(rho, n, p, q, tstat = "Wilks")
## Wilks' Lambda, using F-approximation (Rao's F):
## stat approx df1 df2 p.value
## 1 to 3: 0.7543611 11.715733 15 1634.653 0.000000000
## 2 to 3: 0.9614300 2.944459 8 1186.000 0.002905057
## 3 to 3: 0.9891858 2.164612 3 594.000 0.091092180
p.asym(rho, n, p, q, tstat = "Hotelling")
## Hotelling-Lawley Trace, using F-approximation:
## stat approx df1 df2 p.value
## 1 to 3: 0.31429738 12.376333 15 1772 0.000000000
## 2 to 3: 0.03980175 2.948647 8 1778 0.002806614
## 3 to 3: 0.01093238 2.167041 3 1784 0.090013176
p.asym(rho, n, p, q, tstat = "Pillai")
## Pillai-Bartlett Trace, using F-approximation:
## stat approx df1 df2 p.value
## 1 to 3: 0.25424936 11.000571 15 1782 0.000000000
## 2 to 3: 0.03887348 2.934093 8 1788 0.002932565
## 3 to 3: 0.01081416 2.163421 3 1794 0.090440474
p.asym(rho, n, p, q, tstat = "Roy")
## Roy's Largest Root, using F-approximation:
## stat approx df1 df2 p.value
## 1 to 1: 0.2153759 32.61008 5 594 0
##
## F statistic for Roy's Greatest Root is an upper bound.
As shown in the table above, the first test of the canonical dimensions tests whether all three dimensions are significant (they are, F = 11.72), the next test tests whether dimensions 2 and 3 combined are significant (they are, F = 2.94). Finally, the last test tests whether dimension 3, by itself, is significant (it is not). Therefore dimensions 1 and 2 must each be significant while dimension three is not.
# standardized psych canonical coefficients diagonal matrix of psych sd's
s1 <- diag(sqrt(diag(cov(psych))))
s1 %*% cc1$xcoef
## [,1] [,2] [,3]
## [1,] -0.8404196 -0.4165639 -0.4435172
## [2,] 0.2478818 -0.8379278 0.5832620
## [3,] -0.4326685 0.6948029 0.6855370
# standardized acad canonical coefficients diagonal matrix of acad sd's
s2 <- diag(sqrt(diag(cov(acad))))
s2 %*% cc1$ycoef
## [,1] [,2] [,3]
## [1,] -0.45080116 -0.04960589 0.21600760
## [2,] -0.34895712 0.40920634 0.88809662
## [3,] -0.22046662 0.03981942 0.08848141
## [4,] -0.04877502 -0.82659938 -1.06607828
## [5,] -0.31503962 0.54057096 -0.89442764
The standardized canonical coefficients are interpreted in a manner analogous to interpreting standardized regression coefficients