Things you may need to know/do.

Relates to lecture 4.
Libraries ggplot2, MASS, tidyverse, GGally may be useful. Others may be mentioned below in the hints.
You might like to set up a project for each lab if you are using RStudio. Then you can copy a .Rmd file into that directory and write your answers in that file.

Question 1 Aeroplane data

Consider the aeroplane data. The data are available in the Data Sets folder. Use the six logged variables as in previous labs. Scale the logged variables and work with the six scaled logged variables in this lab.

(a). Calculate the factor loadings based on the principal component factor analysis approach, and extract the first two and five factor loadings. (Hint: see p14 of the lecture slides.) Recall that the factor loadings are calculated from the eigenvectors of the covariance matrix the eigenvalues as shown on p14.)

(b). Use varimax to calculate the VC optimal factor loadings A_{VC,2} and A_{VC,5} for the 2-factor model and the 5-factor model respectively.

(c). Display the unrotated or plain factor loadings A_2 of part (a), the VC optimal factor loading A_{VC,2}, and the first two components of the A_{VC,5} in biplots.

(d). List the factor loadings you have computed.

(e). Compare the results of parts (b)–(d) and comment on similarities and differences.

Calculate the factor loadings based on the principal component factor analysis approach, and extract the first two and five factor loadings. (Hint: see p14 of the lecture slides.) Recall that the factor loadings are calculated from the eigenvectors of the covariance matrix the eigenvalues as shown on p14.)

# Get the aircraft data and check
aircraft0 = read.csv("aircraft.csv")
aircraft = mutate(aircraft0, Period = factor(Period), logPower = log10(Power), logSpan = log10(Span),
    logLength = log10(Length), logWeight = log10(Weight), logSpeed = log10(Speed),
    logRange = log10(Range)) %>%
    dplyr::select(starts_with("log"))  # have droppped the Period and Year as well, implicitly
summary(aircraft)

    logPower        logSpan         logLength        logWeight    
 Min.   :1.477   Min.   :0.8248   Min.   :0.7551   Min.   :2.682  
 1st Qu.:2.624   1st Qu.:1.0414   1st Qu.:0.9552   1st Qu.:3.378  
 Median :3.006   Median :1.1553   Median :1.0777   Median :3.740  
 Mean   :3.084   Mean   :1.2038   Mean   :1.1265   Mean   :3.801  
 3rd Qu.:3.489   3rd Qu.:1.3444   3rd Qu.:1.2514   3rd Qu.:4.146  
 Max.   :4.903   Max.   :1.8457   Max.   :1.8782   Max.   :5.562  
    logSpeed        logRange    
 Min.   :2.021   Min.   :1.903  
 1st Qu.:2.413   1st Qu.:2.929  
 Median :2.611   Median :3.146  
 Mean   :2.637   Mean   :3.172  
 3rd Qu.:2.816   3rd Qu.:3.380  
 Max.   :3.508   Max.   :4.304

head(aircraft)

  logPower   logSpan logLength logWeight logSpeed logRange
1 1.913814 1.1072100 0.8808136  3.029384 2.021189 2.602060
2 1.913814 1.0413927 0.9542425  2.919078 2.161368 2.604226
3 2.349472 1.2528530 1.0149403  3.342423 2.130334 2.698970
4 2.214844 1.1613680 0.9912261  3.289143 2.139879 2.698970
5 2.075547 1.1105897 0.8976271  3.075547 2.146128 2.602060
6 1.872156 0.8750613 0.7993405  2.814913 2.247973 2.544068

Compute the PCA decomposition and assemble the parts.

PCA.air = prcomp(aircraft, scale. = TRUE)
str(PCA.air)

List of 5
 $ sdev    : num [1:6] 2.112 1.034 0.616 0.221 0.174 ...
 $ rotation: num [1:6, 1:6] 0.448 0.363 0.451 0.468 0.307 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:6] "logPower" "logSpan" "logLength" "logWeight" ...
  .. ..$ : chr [1:6] "PC1" "PC2" "PC3" "PC4" ...
 $ center  : Named num [1:6] 3.08 1.2 1.13 3.8 2.64 ...
  ..- attr(*, "names")= chr [1:6] "logPower" "logSpan" "logLength" "logWeight" ...
 $ scale   : Named num [1:6] 0.654 0.207 0.221 0.567 0.295 ...
  ..- attr(*, "names")= chr [1:6] "logPower" "logSpan" "logLength" "logWeight" ...
 $ x       : num [1:709, 1:6] -3.41 -3.32 -2.1 -2.43 -3.09 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr [1:6] "PC1" "PC2" "PC3" "PC4" ...
 - attr(*, "class")= chr "prcomp"

We obtain A=ΓΛ^1/2

where Γ Γ- is the rotation component of the output of prcomp and Λ Λ - is the component sdev. Then taking the first k columns of this matrix, we get what we need for slide p14.

A = PCA.air$rotation %*% diag(PCA.air$sdev)
A

               [,1]         [,2]        [,3]        [,4]        [,5]
logPower  0.9461937 -0.262870434  0.13346042 -0.05782988  0.10614720
logSpan   0.7657987  0.620278874  0.09739206 -0.11593991 -0.07449777
logLength 0.9523188  0.200492891  0.14297982  0.17639057 -0.02790295
logWeight 0.9875903  0.008911862  0.11765556 -0.01288556  0.04208941
logSpeed  0.6474923 -0.754247710  0.01625343 -0.02769967 -0.10407778
logRange  0.8221100  0.075844814 -0.56408990  0.00752432  0.01095895
                   [,6]
logPower  -0.0565106742
logSpan   -0.0181467371
logLength -0.0235820986
logWeight  0.0938380089
logSpeed  -0.0006826508
logRange  -0.0029278709

(b). Use varimax to calculate the VC optimal factor loadings A_{VC,2} and A_{VC,5} for the 2-factor model and the 5-factor model respectively.

Use varimax on the PCs.

A_VC2 = varimax(A[, 1:2])  ## 2-factor
A_VC5 = varimax(A[, 1:5])  ## 5-factor

A_VC2

$loadings

Loadings:
          [,1]   [,2]  
logPower   0.608 -0.771
logSpan    0.984       
logLength  0.887 -0.401
logWeight  0.802 -0.576
logSpeed         -0.991
logRange   0.708 -0.424

                [,1]  [,2]
SS loadings    3.277 2.251
Proportion Var 0.546 0.375
Cumulative Var 0.546 0.921

$rotmat
          [,1]       [,2]
[1,] 0.8070450 -0.5904899
[2,] 0.5904899  0.8070450

A_VC5

$loadings

Loadings:
          [,1]   [,2]   [,3]   [,4]   [,5]  
logPower   0.556 -0.760 -0.263 -0.201       
logSpan    0.952        -0.288        -0.100
logLength  0.842 -0.387 -0.299         0.226
logWeight  0.743 -0.564 -0.318 -0.130       
logSpeed         -0.981 -0.187              
logRange   0.404 -0.297 -0.865              

               [,1]  [,2]  [,3]  [,4]  [,5]
SS loadings    2.64 2.096 1.125 0.060 0.067
Proportion Var 0.44 0.349 0.187 0.010 0.011
Cumulative Var 0.44 0.789 0.977 0.987 0.998

$rotmat
            [,1]        [,2]        [,3]        [,4]        [,5]
[1,]  0.70456087 -0.55914764 -0.42599841 -0.07786386  0.05839932
[2,]  0.58753288  0.80292712 -0.09011528  0.01106364 -0.04324333
[3,]  0.38720983 -0.17803096  0.89836742 -0.08722293  0.06084136
[4,] -0.02864490  0.06917122 -0.02763021  0.13589110  0.98750443
[5,] -0.08744781  0.07862746 -0.05076561 -0.98373807  0.12590818

str(A_VC5)

List of 2
 $ loadings: 'loadings' num [1:6, 1:5] 0.5563 0.9515 0.8415 0.7433 0.0292 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:6] "logPower" "logSpan" "logLength" "logWeight" ...
  .. ..$ : NULL
 $ rotmat  : num [1:5, 1:5] 0.7046 0.5875 0.3872 -0.0286 -0.0874 ...

(c). Display the unrotated or plain factor loadings A_2 of part (a), the VC optimal factor loading A_{VC,2}, and the first two components of the A_{VC,5} in biplots.

Need to put some scores into the first slot to make it parse correctly. Then don’t show by setting colour to “white”. (Crazy, but it beats having to write the code yourself!)

Plain Factor loadings A_2

str(PCA.air)

List of 5
 $ sdev    : num [1:6] 2.112 1.034 0.616 0.221 0.174 ...
 $ rotation: num [1:6, 1:6] 0.448 0.363 0.451 0.468 0.307 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:6] "logPower" "logSpan" "logLength" "logWeight" ...
  .. ..$ : chr [1:6] "PC1" "PC2" "PC3" "PC4" ...
 $ center  : Named num [1:6] 3.08 1.2 1.13 3.8 2.64 ...
  ..- attr(*, "names")= chr [1:6] "logPower" "logSpan" "logLength" "logWeight" ...
 $ scale   : Named num [1:6] 0.654 0.207 0.221 0.567 0.295 ...
  ..- attr(*, "names")= chr [1:6] "logPower" "logSpan" "logLength" "logWeight" ...
 $ x       : num [1:709, 1:6] -3.41 -3.32 -2.1 -2.43 -3.09 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr [1:6] "PC1" "PC2" "PC3" "PC4" ...
 - attr(*, "class")= chr "prcomp"

biplot(PCA.air$x, A[, 1:2], col = c("white", "blue"))

the VC optimal factor loading A_{VC,2}

biplot(PCA.air$x, A_VC2$loadings, col = c("white", "blue"))

first two components of the A_{VC,5}

biplot(PCA.air$x, A_VC5$loadings, col = c("white", "blue"))

(d). List the factor loadings you have computed.

# PCA.air$rotation[,1:2]
list.loadings <- list(A[, 1:2], A_VC2$loadings, A_VC5$loadings[, 1:2])
list.loadings

[[1]]
               [,1]         [,2]
logPower  0.9461937 -0.262870434
logSpan   0.7657987  0.620278874
logLength 0.9523188  0.200492891
logWeight 0.9875903  0.008911862
logSpeed  0.6474923 -0.754247710
logRange  0.8221100  0.075844814

[[2]]

Loadings:
          [,1]   [,2]  
logPower   0.608 -0.771
logSpan    0.984       
logLength  0.887 -0.401
logWeight  0.802 -0.576
logSpeed         -0.991
logRange   0.708 -0.424

                [,1]  [,2]
SS loadings    3.277 2.251
Proportion Var 0.546 0.375
Cumulative Var 0.546 0.921

[[3]]
                [,1]        [,2]
logPower  0.55625739 -0.75954192
logSpan   0.95153293  0.03862813
logLength 0.84151326 -0.38695323
logWeight 0.74329937 -0.56358147
logSpeed  0.02924073 -0.98064274
logRange  0.40419283 -0.29697540

(e). Compare the results of parts (b)–(d) and comment on similarities and differences.

   - Much clearer interpretation after rotation (VC2, VC5).

   - Component 1: LogSpan, LogLength, LogWeight, LogRange

   - Component 2: logPower, logSpeed

Addition from Darfiana The aim is to perform PCFA using psych R package

library(psych)
cov.air = cov(aircraft)
cov.air

            logPower    logSpan  logLength  logWeight   logSpeed   logRange
logPower  0.42773036 0.07774532 0.12356457 0.35153693 0.15486795 0.15017579
logSpan   0.07774532 0.04283590 0.03882906 0.09037839 0.00247371 0.04310932
logLength 0.12356457 0.03882906 0.04875837 0.11940641 0.03030218 0.05330035
logWeight 0.35153693 0.09037839 0.11940641 0.32166993 0.10537134 0.14218951
logSpeed  0.15486795 0.00247371 0.03030218 0.10537134 0.08681238 0.04598387
logRange  0.15017579 0.04310932 0.05330035 0.14218951 0.04598387 0.11284750

pca.air2 <- principal(cov.air, 2, rotate = "varimax")
names(pca.air2)

 [1] "values"       "rotation"     "n.obs"        "communality"  "loadings"    
 [6] "fit"          "fit.off"      "fn"           "Call"         "uniquenesses"
[11] "complexity"   "valid"        "chi"          "EPVAL"        "R2"          
[16] "objective"    "residual"     "rms"          "factors"      "dof"         
[21] "null.dof"     "null.model"   "criteria"     "PVAL"         "weights"     
[26] "r.scores"     "rot.mat"      "Vaccounted"   "Structure"

pca.air2$loadings


Loadings:
          RC1    RC2   
logPower   0.608  0.771
logSpan    0.984       
logLength  0.887  0.401
logWeight  0.802  0.576
logSpeed          0.991
logRange   0.708  0.424

                 RC1   RC2
SS loadings    3.277 2.251
Proportion Var 0.546 0.375
Cumulative Var 0.546 0.921

# PCA.air$rotation[,1:2]
list.loadings2 <- list(A_VC2$loadings, pca.air2$loadings)
list.loadings2

[[1]]

Loadings:
          [,1]   [,2]  
logPower   0.608 -0.771
logSpan    0.984       
logLength  0.887 -0.401
logWeight  0.802 -0.576
logSpeed         -0.991
logRange   0.708 -0.424

                [,1]  [,2]
SS loadings    3.277 2.251
Proportion Var 0.546 0.375
Cumulative Var 0.546 0.921

[[2]]

Loadings:
          RC1    RC2   
logPower   0.608  0.771
logSpan    0.984       
logLength  0.887  0.401
logWeight  0.802  0.576
logSpeed          0.991
logRange   0.708  0.424

                 RC1   RC2
SS loadings    3.277 2.251
Proportion Var 0.546 0.375
Cumulative Var 0.546 0.921

The loadings are the same as when being calculated manually.

str(pca.air2)

List of 29
 $ values      : num [1:6] 4.4591 1.0688 0.38 0.0489 0.0303 ...
 $ rotation    : chr "varimax"
 $ n.obs       : logi NA
 $ communality : Named num [1:6] 0.964 0.971 0.947 0.975 0.988 ...
  ..- attr(*, "names")= chr [1:6] "logPower" "logSpan" "logLength" "logWeight" ...
 $ loadings    : 'loadings' num [1:6, 1:2] 0.6084 0.9843 0.887 0.8023 0.0772 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:6] "logPower" "logSpan" "logLength" "logWeight" ...
  .. ..$ : chr [1:2] "RC1" "RC2"
 $ fit         : num 0.993
 $ fit.off     : num 0.997
 $ fn          : chr "principal"
 $ Call        : language principal(r = cov.air, nfactors = 2, rotate = "varimax")
 $ uniquenesses: Named num [1:6] 0.0356 0.0288 0.0529 0.0246 0.0119 ...
  ..- attr(*, "names")= chr [1:6] "logPower" "logSpan" "logLength" "logWeight" ...
 $ complexity  : Named num [1:6] 1.9 1 1.39 1.81 1.01 ...
  ..- attr(*, "names")= chr [1:6] "logPower" "logSpan" "logLength" "logWeight" ...
 $ valid       : num [1:2] 0.927 0.928
 $ chi         : num NA
 $ EPVAL       : num NA
 $ R2          : Named num [1:2] 1 1
  ..- attr(*, "names")= chr [1:2] "RC1" "RC2"
 $ objective   : num 0.455
 $ residual    : num [1:6, 1:6] 0.03562 0.01282 0.00725 0.01561 -0.00724 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:6] "logPower" "logSpan" "logLength" "logWeight" ...
  .. ..$ : chr [1:6] "logPower" "logSpan" "logLength" "logWeight" ...
 $ rms         : num 0.0369
 $ factors     : int 2
 $ dof         : num 4
 $ null.dof    : num 15
 $ null.model  : num 10.3
 $ criteria    : Named num [1:3] 0.455 NA NA
  ..- attr(*, "names")= chr [1:3] "objective" "" ""
 $ PVAL        : logi NA
 $ weights     : num [1:6, 1:2] 0.026 0.481 0.283 0.184 -0.3 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr [1:2] "RC1" "RC2"
 $ r.scores    : num [1:2, 1:2] 1.00 -6.05e-15 -6.10e-15 1.00
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:2] "RC1" "RC2"
  .. ..$ : chr [1:2] "RC1" "RC2"
 $ rot.mat     : num [1:2, 1:2] 0.807 -0.59 0.59 0.807
 $ Vaccounted  : num [1:5, 1:2] 3.277 0.546 0.546 0.593 0.593 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:5] "SS loadings" "Proportion Var" "Cumulative Var" "Proportion Explained" ...
  .. ..$ : chr [1:2] "RC1" "RC2"
 $ Structure   : 'loadings' num [1:6, 1:2] 0.6084 0.9843 0.887 0.8023 0.0772 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:6] "logPower" "logSpan" "logLength" "logWeight" ...
  .. ..$ : chr [1:2] "RC1" "RC2"
 - attr(*, "class")= chr [1:2] "psych" "principal"

# Assuming 'data' is your original dataset and 'pca.air2' is your PCA result
scores <- cov.air %*% pca.air2$loadings

# Now, create the biplot
biplot(scores, pca.air2$loadings[, 1:2], col = c("white", "blue"))

pca.air2 <- principal(cov.air, 5, rotate = "varimax")

scores <- cov.air %*% pca.air2$loadings

biplot(scores, pca.air2$loadings[, 1:5], col = c("white", "blue"))

Question 2 Wine data

Consider the 13-dimensional wine recognition data. The data are available in the Data Sets folder. Ignore the first column for the calculations in this lab, as these are the labels, and note that the file is tsv (not csv).

Read the data into R.

(a). Scale the data and work with the scaled data in this question.

wine0 = read_csv(file = "wine.tsv", col_names = FALSE)
summary(wine0)

       X1              X2              X3              X4       
 Min.   :1.000   Min.   :11.03   Min.   :0.740   Min.   :1.360  
 1st Qu.:1.000   1st Qu.:12.36   1st Qu.:1.603   1st Qu.:2.210  
 Median :2.000   Median :13.05   Median :1.865   Median :2.360  
 Mean   :1.938   Mean   :13.00   Mean   :2.336   Mean   :2.367  
 3rd Qu.:3.000   3rd Qu.:13.68   3rd Qu.:3.083   3rd Qu.:2.558  
 Max.   :3.000   Max.   :14.83   Max.   :5.800   Max.   :3.230  
       X5              X6               X7              X8       
 Min.   :10.60   Min.   : 70.00   Min.   :0.980   Min.   :0.340  
 1st Qu.:17.20   1st Qu.: 88.00   1st Qu.:1.742   1st Qu.:1.205  
 Median :19.50   Median : 98.00   Median :2.355   Median :2.135  
 Mean   :19.49   Mean   : 99.74   Mean   :2.295   Mean   :2.029  
 3rd Qu.:21.50   3rd Qu.:107.00   3rd Qu.:2.800   3rd Qu.:2.875  
 Max.   :30.00   Max.   :162.00   Max.   :3.880   Max.   :5.080  
       X9              X10             X11              X12        
 Min.   :0.1300   Min.   :0.410   Min.   : 1.280   Min.   :0.4800  
 1st Qu.:0.2700   1st Qu.:1.250   1st Qu.: 3.220   1st Qu.:0.7825  
 Median :0.3400   Median :1.555   Median : 4.690   Median :0.9650  
 Mean   :0.3619   Mean   :1.591   Mean   : 5.058   Mean   :0.9574  
 3rd Qu.:0.4375   3rd Qu.:1.950   3rd Qu.: 6.200   3rd Qu.:1.1200  
 Max.   :0.6600   Max.   :3.580   Max.   :13.000   Max.   :1.7100  
      X13             X14        
 Min.   :1.270   Min.   : 278.0  
 1st Qu.:1.938   1st Qu.: 500.5  
 Median :2.780   Median : 673.5  
 Mean   :2.612   Mean   : 746.9  
 3rd Qu.:3.170   3rd Qu.: 985.0  
 Max.   :4.000   Max.   :1680.0

head(wine0)

# A tibble: 6 × 14
     X1    X2    X3    X4    X5    X6    X7    X8    X9   X10   X11   X12   X13
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     1  14.2  1.71  2.43  15.6   127  2.8   3.06  0.28  2.29  5.64  1.04  3.92
2     1  13.2  1.78  2.14  11.2   100  2.65  2.76  0.26  1.28  4.38  1.05  3.4 
3     1  13.2  2.36  2.67  18.6   101  2.8   3.24  0.3   2.81  5.68  1.03  3.17
4     1  14.4  1.95  2.5   16.8   113  3.85  3.49  0.24  2.18  7.8   0.86  3.45
5     1  13.2  2.59  2.87  21     118  2.8   2.69  0.39  1.82  4.32  1.04  2.93
6     1  14.2  1.76  2.45  15.2   112  3.27  3.39  0.34  1.97  6.75  1.05  2.85
# ℹ 1 more variable: X14 <dbl>

#Omit X1

wine <- subset(wine0, select = c(-X1))
head(wine)

# A tibble: 6 × 13
     X2    X3    X4    X5    X6    X7    X8    X9   X10   X11   X12   X13   X14
  <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1  14.2  1.71  2.43  15.6   127  2.8   3.06  0.28  2.29  5.64  1.04  3.92  1065
2  13.2  1.78  2.14  11.2   100  2.65  2.76  0.26  1.28  4.38  1.05  3.4   1050
3  13.2  2.36  2.67  18.6   101  2.8   3.24  0.3   2.81  5.68  1.03  3.17  1185
4  14.4  1.95  2.5   16.8   113  3.85  3.49  0.24  2.18  7.8   0.86  3.45  1480
5  13.2  2.59  2.87  21     118  2.8   2.69  0.39  1.82  4.32  1.04  2.93   735
6  14.2  1.76  2.45  15.2   112  3.27  3.39  0.34  1.97  6.75  1.05  2.85  1450

summary(wine)

       X2              X3              X4              X5       
 Min.   :11.03   Min.   :0.740   Min.   :1.360   Min.   :10.60  
 1st Qu.:12.36   1st Qu.:1.603   1st Qu.:2.210   1st Qu.:17.20  
 Median :13.05   Median :1.865   Median :2.360   Median :19.50  
 Mean   :13.00   Mean   :2.336   Mean   :2.367   Mean   :19.49  
 3rd Qu.:13.68   3rd Qu.:3.083   3rd Qu.:2.558   3rd Qu.:21.50  
 Max.   :14.83   Max.   :5.800   Max.   :3.230   Max.   :30.00  
       X6               X7              X8              X9        
 Min.   : 70.00   Min.   :0.980   Min.   :0.340   Min.   :0.1300  
 1st Qu.: 88.00   1st Qu.:1.742   1st Qu.:1.205   1st Qu.:0.2700  
 Median : 98.00   Median :2.355   Median :2.135   Median :0.3400  
 Mean   : 99.74   Mean   :2.295   Mean   :2.029   Mean   :0.3619  
 3rd Qu.:107.00   3rd Qu.:2.800   3rd Qu.:2.875   3rd Qu.:0.4375  
 Max.   :162.00   Max.   :3.880   Max.   :5.080   Max.   :0.6600  
      X10             X11              X12              X13       
 Min.   :0.410   Min.   : 1.280   Min.   :0.4800   Min.   :1.270  
 1st Qu.:1.250   1st Qu.: 3.220   1st Qu.:0.7825   1st Qu.:1.938  
 Median :1.555   Median : 4.690   Median :0.9650   Median :2.780  
 Mean   :1.591   Mean   : 5.058   Mean   :0.9574   Mean   :2.612  
 3rd Qu.:1.950   3rd Qu.: 6.200   3rd Qu.:1.1200   3rd Qu.:3.170  
 Max.   :3.580   Max.   :13.000   Max.   :1.7100   Max.   :4.000  
      X14        
 Min.   : 278.0  
 1st Qu.: 500.5  
 Median : 673.5  
 Mean   : 746.9  
 3rd Qu.: 985.0  
 Max.   :1680.0

dim(wine)

[1] 178  13

Scale the data

wine0 = scale(wine)  # the scaled data
head(wine0)

            X2          X3         X4         X5         X6        X7        X8
[1,] 1.5143408 -0.56066822  0.2313998 -1.1663032 1.90852151 0.8067217 1.0319081
[2,] 0.2455968 -0.49800856 -0.8256672 -2.4838405 0.01809398 0.5670481 0.7315653
[3,] 0.1963252  0.02117152  1.1062139 -0.2679823 0.08810981 0.8067217 1.2121137
[4,] 1.6867914 -0.34583508  0.4865539 -0.8069748 0.92829983 2.4844372 1.4623994
[5,] 0.2948684  0.22705328  1.8352256  0.4506745 1.27837900 0.8067217 0.6614853
[6,] 1.4773871 -0.51591132  0.3043010 -1.2860793 0.85828399 1.5576991 1.3622851
             X9        X10        X11        X12       X13         X14
[1,] -0.6577078  1.2214385  0.2510088  0.3611585 1.8427215  1.01015939
[2,] -0.8184106 -0.5431887 -0.2924962  0.4049085 1.1103172  0.96252635
[3,] -0.4970050  2.1299594  0.2682629  0.3174085 0.7863692  1.39122370
[4,] -0.9791134  1.0292513  1.1827317 -0.4263410 1.1807407  2.32800680
[5,]  0.2261576  0.4002753 -0.3183774  0.3611585 0.4483365 -0.03776747
[6,] -0.1755994  0.6623487  0.7298108  0.4049085 0.3356589  2.23274072

(b). Calculate a two-factor model based on the PC approach and show the result in a biplot.

A two-factor model, particularly in the context of statistics and data analysis, typically refers to a model that explains a set of data using two underlying factors or variables. This is often used in factor analysis, a technique for reducing the dimensionality of data while preserving as much information as possible.

In your question, you mention the “PC approach,” which likely refers to Principal Component Analysis (PCA). PCA is a statistical procedure that uses an orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. The first principal component has the largest possible variance (that is, accounts for as much of the variability in the data as possible), and each succeeding component, in turn, has the highest variance possible under the constraint that it is orthogonal to the preceding components.

To calculate a two-factor model based on PCA and show the result in a biplot, you would typically follow these steps:

Data Preparation: Standardize your data if the variables are measured on different scales. This is important because PCA is affected by the scale of the variables.

Perform PCA: Apply PCA to your data to identify the principal components. For a two-factor model, you would focus on the first two principal components.

Create a Biplot: A biplot is a graphical display of data where both the scores (the transformed variable values) and the loadings (the coefficients of the linear combination) are displayed. In the context of PCA, a biplot allows you to visualize how each variable contributes to the two principal components and how the observations are distributed according to these components.

PCA.wine = prcomp(wine0)
str(PCA.wine)

List of 5
 $ sdev    : num [1:13] 2.169 1.58 1.203 0.959 0.924 ...
 $ rotation: num [1:13, 1:13] -0.14433 0.24519 0.00205 0.23932 -0.14199 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:13] "X2" "X3" "X4" "X5" ...
  .. ..$ : chr [1:13] "PC1" "PC2" "PC3" "PC4" ...
 $ center  : Named num [1:13] -8.59e-16 -6.73e-17 8.05e-16 -7.68e-17 -4.10e-17 ...
  ..- attr(*, "names")= chr [1:13] "X2" "X3" "X4" "X5" ...
 $ scale   : Named num [1:13] 0.812 1.117 0.274 3.34 14.282 ...
  ..- attr(*, "names")= chr [1:13] "X2" "X3" "X4" "X5" ...
 $ x       : num [1:178, 1:13] -3.31 -2.2 -2.51 -3.75 -1.01 ...
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : NULL
  .. ..$ : chr [1:13] "PC1" "PC2" "PC3" "PC4" ...
 - attr(*, "class")= chr "prcomp"

We obtain A=ΓΛ^1/2

where Γ Γ- is the rotation component of the output of prcomp and Λ Λ - is the component sdev. Then taking the first k columns of this matrix, we get what we need for slide p14.

A <- PCA.wine$rotation %*% diag(PCA.wine$sdev)
A  # factor loadings, PC approach

            [,1]         [,2]       [,3]        [,4]        [,5]        [,6]
X2  -0.313093350 -0.764257253 -0.2493833 -0.01711761  0.24539445 -0.17105193
X3   0.531884726 -0.355431713  0.1070404  0.51467982 -0.03252695 -0.43000667
X4   0.004449362 -0.499446109  0.7530514 -0.20531539  0.13211313 -0.12373961
X5   0.519157081  0.016734916  0.7360433  0.05834174 -0.06105952  0.08076395
X6  -0.308022936 -0.473476124  0.1572388 -0.33724320 -0.67157727 -0.03055463
X7  -0.856136658 -0.102774237  0.1757842  0.18987451  0.13792594  0.06738490
X8  -0.917470177  0.005309113  0.1811991  0.14599455  0.10070755  0.01515560
X9   0.647607018 -0.045476816  0.2048724 -0.19489072  0.46250110  0.20714285
X10 -0.679921705 -0.062103856  0.1797229  0.38254807 -0.12641790  0.42758878
X11  0.192235968 -0.837489383 -0.1651145  0.06319842  0.07060492  0.33534860
X12 -0.643662066  0.441242229  0.1024817 -0.41007505  0.16036834 -0.08489588
X13 -0.816018903  0.259933849  0.1996251  0.17650390  0.09344276 -0.21295601
X14 -0.622050797 -0.576612723 -0.1524154 -0.22247039  0.14582396 -0.09590437
           [,7]        [,8]        [,9]       [,10]       [,11]       [,12]
X2  -0.04186374 -0.23385552 -0.27337033 -0.10599316  0.10734946  0.10939488
X3   0.31216029 -0.03885994  0.04046279  0.15481866 -0.03634380 -0.04999474
X4  -0.11073126  0.10051073  0.16537823  0.01358715  0.23696431  0.02038569
X5  -0.21302087 -0.25264649 -0.10773660 -0.02644732 -0.22775659  0.02290009
X6   0.23968041  0.09230588 -0.14587224 -0.03399631 -0.03387451 -0.02556105
X7  -0.02072907  0.23963777 -0.15373655  0.16035433 -0.14461448  0.12483994
X8  -0.04504741  0.11053780 -0.02664723  0.08172236  0.01220912  0.01762355
X9   0.44200815  0.13771648 -0.10507717 -0.10796188 -0.05554567 -0.01739898
X10  0.27624380 -0.21737775  0.11241025 -0.06721294  0.11278810  0.03925476
X11 -0.16903364  0.01995156 -0.03021554  0.14564978 -0.01512891 -0.24822424
X12  0.17227272 -0.25775493 -0.04613066  0.26167047  0.02290901 -0.10648940
X13 -0.03322867  0.04610995 -0.07375610 -0.26232514 -0.02205906 -0.24688379
X14  0.05701297 -0.07085378  0.30947094 -0.08120417 -0.25624604  0.03261950
           [,13]
X2  -0.004813211
X3  -0.008347978
X4   0.045405042
X5  -0.029478274
X6  -0.018254297
X7   0.149157711
X8  -0.267590948
X9  -0.036666594
X10  0.037591689
X11  0.003855979
X12  0.028901456
X13  0.050388703
X14 -0.004645171

biplot(PCA.wine$x, A[, 1:2], col = c("white", "blue"))

(c). Calculate the sample covariance matrix of the scaled data and the eigenvalues of this matrix. For $k=2$, calculate $\hat{\sigma}^2$ as in Box 6.7 and list this value.

We compute \[\hat{\sigma}^2 = \frac{1}{d-k} \sum_{j>k} \hat{\lambda}_j\].

(Here the $\lambda$ are the eigenvalues of the correlation matrix.)

cov(wine0)

             X2          X3           X4          X5          X6          X7
X2   1.00000000  0.09439694  0.211544596 -0.31023514  0.27079823  0.28910112
X3   0.09439694  1.00000000  0.164045470  0.28850040 -0.05457510 -0.33516700
X4   0.21154460  0.16404547  1.000000000  0.44336719  0.28658669  0.12897954
X5  -0.31023514  0.28850040  0.443367187  1.00000000 -0.08333309 -0.32111332
X6   0.27079823 -0.05457510  0.286586691 -0.08333309  1.00000000  0.21440123
X7   0.28910112 -0.33516700  0.128979538 -0.32111332  0.21440123  1.00000000
X8   0.23681493 -0.41100659  0.115077279 -0.35136986  0.19578377  0.86456350
X9  -0.15592947  0.29297713  0.186230446  0.36192172 -0.25629405 -0.44993530
X10  0.13669791 -0.22074619  0.009651935 -0.19732684  0.23644061  0.61241308
X11  0.54636420  0.24898534  0.258887259  0.01873198  0.19995001 -0.05513642
X12 -0.07174720 -0.56129569 -0.074666889 -0.27395522  0.05539820  0.43368134
X13  0.07234319 -0.36871043  0.003911231 -0.27676855  0.06600394  0.69994936
X14  0.64372004 -0.19201056  0.223626264 -0.44059693  0.39335085  0.49811488
            X8         X9          X10         X11         X12          X13
X2   0.2368149 -0.1559295  0.136697912  0.54636420 -0.07174720  0.072343187
X3  -0.4110066  0.2929771 -0.220746187  0.24898534 -0.56129569 -0.368710428
X4   0.1150773  0.1862304  0.009651935  0.25888726 -0.07466689  0.003911231
X5  -0.3513699  0.3619217 -0.197326836  0.01873198 -0.27395522 -0.276768549
X6   0.1957838 -0.2562940  0.236440610  0.19995001  0.05539820  0.066003936
X7   0.8645635 -0.4499353  0.612413084 -0.05513642  0.43368134  0.699949365
X8   1.0000000 -0.5378996  0.652691769 -0.17237940  0.54347857  0.787193902
X9  -0.5378996  1.0000000 -0.365845099  0.13905701 -0.26263963 -0.503269596
X10  0.6526918 -0.3658451  1.000000000 -0.02524993  0.29554425  0.519067096
X11 -0.1723794  0.1390570 -0.025249931  1.00000000 -0.52181319 -0.428814942
X12  0.5434786 -0.2626396  0.295544253 -0.52181319  1.00000000  0.565468293
X13  0.7871939 -0.5032696  0.519067096 -0.42881494  0.56546829  1.000000000
X14  0.4941931 -0.3113852  0.330416700  0.31610011  0.23618345  0.312761075
           X14
X2   0.6437200
X3  -0.1920106
X4   0.2236263
X5  -0.4405969
X6   0.3933508
X7   0.4981149
X8   0.4941931
X9  -0.3113852
X10  0.3304167
X11  0.3161001
X12  0.2361834
X13  0.3127611
X14  1.0000000

cor(wine0)

             X2          X3           X4          X5          X6          X7
X2   1.00000000  0.09439694  0.211544596 -0.31023514  0.27079823  0.28910112
X3   0.09439694  1.00000000  0.164045470  0.28850040 -0.05457510 -0.33516700
X4   0.21154460  0.16404547  1.000000000  0.44336719  0.28658669  0.12897954
X5  -0.31023514  0.28850040  0.443367187  1.00000000 -0.08333309 -0.32111332
X6   0.27079823 -0.05457510  0.286586691 -0.08333309  1.00000000  0.21440123
X7   0.28910112 -0.33516700  0.128979538 -0.32111332  0.21440123  1.00000000
X8   0.23681493 -0.41100659  0.115077279 -0.35136986  0.19578377  0.86456350
X9  -0.15592947  0.29297713  0.186230446  0.36192172 -0.25629405 -0.44993530
X10  0.13669791 -0.22074619  0.009651935 -0.19732684  0.23644061  0.61241308
X11  0.54636420  0.24898534  0.258887259  0.01873198  0.19995001 -0.05513642
X12 -0.07174720 -0.56129569 -0.074666889 -0.27395522  0.05539820  0.43368134
X13  0.07234319 -0.36871043  0.003911231 -0.27676855  0.06600394  0.69994936
X14  0.64372004 -0.19201056  0.223626264 -0.44059693  0.39335085  0.49811488
            X8         X9          X10         X11         X12          X13
X2   0.2368149 -0.1559295  0.136697912  0.54636420 -0.07174720  0.072343187
X3  -0.4110066  0.2929771 -0.220746187  0.24898534 -0.56129569 -0.368710428
X4   0.1150773  0.1862304  0.009651935  0.25888726 -0.07466689  0.003911231
X5  -0.3513699  0.3619217 -0.197326836  0.01873198 -0.27395522 -0.276768549
X6   0.1957838 -0.2562940  0.236440610  0.19995001  0.05539820  0.066003936
X7   0.8645635 -0.4499353  0.612413084 -0.05513642  0.43368134  0.699949365
X8   1.0000000 -0.5378996  0.652691769 -0.17237940  0.54347857  0.787193902
X9  -0.5378996  1.0000000 -0.365845099  0.13905701 -0.26263963 -0.503269596
X10  0.6526918 -0.3658451  1.000000000 -0.02524993  0.29554425  0.519067096
X11 -0.1723794  0.1390570 -0.025249931  1.00000000 -0.52181319 -0.428814942
X12  0.5434786 -0.2626396  0.295544253 -0.52181319  1.00000000  0.565468293
X13  0.7871939 -0.5032696  0.519067096 -0.42881494  0.56546829  1.000000000
X14  0.4941931 -0.3113852  0.330416700  0.31610011  0.23618345  0.312761075
           X14
X2   0.6437200
X3  -0.1920106
X4   0.2236263
X5  -0.4405969
X6   0.3933508
X7   0.4981149
X8   0.4941931
X9  -0.3113852
X10  0.3304167
X11  0.3161001
X12  0.2361834
X13  0.3127611
X14  1.0000000

eig = eigen(cov(wine0))
eig$values

 [1] 4.7058503 2.4969737 1.4460720 0.9189739 0.8532282 0.6416570 0.5510283
 [8] 0.3484974 0.2888799 0.2509025 0.2257886 0.1687702 0.1033779

sigma_hat_sq = mean(eig$values[3:13])  #K = 2 so everything between 3 and 13
sigma_hat_sq

[1] 0.527016

Eigenvalues

[1] 4.7058503 2.4969737 1.4460720 0.9189739 0.8532282 0.6416570 0.5510283 [8] 0.3484974 0.2888799 0.2509025 0.2257886 0.1687702 0.1033779

For k=2 σ^2 =0.527016

(d). Calculate the sample equivalent of $\Sigma_A$ (see p14 right branch) for the scaled data and list the matrix.

ΣA = Σ − Ω

Part Σ = S

S = cov(wine0)  # wine0 is the scaled wine data
S

             X2          X3           X4          X5          X6          X7
X2   1.00000000  0.09439694  0.211544596 -0.31023514  0.27079823  0.28910112
X3   0.09439694  1.00000000  0.164045470  0.28850040 -0.05457510 -0.33516700
X4   0.21154460  0.16404547  1.000000000  0.44336719  0.28658669  0.12897954
X5  -0.31023514  0.28850040  0.443367187  1.00000000 -0.08333309 -0.32111332
X6   0.27079823 -0.05457510  0.286586691 -0.08333309  1.00000000  0.21440123
X7   0.28910112 -0.33516700  0.128979538 -0.32111332  0.21440123  1.00000000
X8   0.23681493 -0.41100659  0.115077279 -0.35136986  0.19578377  0.86456350
X9  -0.15592947  0.29297713  0.186230446  0.36192172 -0.25629405 -0.44993530
X10  0.13669791 -0.22074619  0.009651935 -0.19732684  0.23644061  0.61241308
X11  0.54636420  0.24898534  0.258887259  0.01873198  0.19995001 -0.05513642
X12 -0.07174720 -0.56129569 -0.074666889 -0.27395522  0.05539820  0.43368134
X13  0.07234319 -0.36871043  0.003911231 -0.27676855  0.06600394  0.69994936
X14  0.64372004 -0.19201056  0.223626264 -0.44059693  0.39335085  0.49811488
            X8         X9          X10         X11         X12          X13
X2   0.2368149 -0.1559295  0.136697912  0.54636420 -0.07174720  0.072343187
X3  -0.4110066  0.2929771 -0.220746187  0.24898534 -0.56129569 -0.368710428
X4   0.1150773  0.1862304  0.009651935  0.25888726 -0.07466689  0.003911231
X5  -0.3513699  0.3619217 -0.197326836  0.01873198 -0.27395522 -0.276768549
X6   0.1957838 -0.2562940  0.236440610  0.19995001  0.05539820  0.066003936
X7   0.8645635 -0.4499353  0.612413084 -0.05513642  0.43368134  0.699949365
X8   1.0000000 -0.5378996  0.652691769 -0.17237940  0.54347857  0.787193902
X9  -0.5378996  1.0000000 -0.365845099  0.13905701 -0.26263963 -0.503269596
X10  0.6526918 -0.3658451  1.000000000 -0.02524993  0.29554425  0.519067096
X11 -0.1723794  0.1390570 -0.025249931  1.00000000 -0.52181319 -0.428814942
X12  0.5434786 -0.2626396  0.295544253 -0.52181319  1.00000000  0.565468293
X13  0.7871939 -0.5032696  0.519067096 -0.42881494  0.56546829  1.000000000
X14  0.4941931 -0.3113852  0.330416700  0.31610011  0.23618345  0.312761075
           X14
X2   0.6437200
X3  -0.1920106
X4   0.2236263
X5  -0.4405969
X6   0.3933508
X7   0.4981149
X8   0.4941931
X9  -0.3113852
X10  0.3304167
X11  0.3161001
X12  0.2361834
X13  0.3127611
X14  1.0000000

ΣA = Σ − Ω

The provided code snippet is written in R, a programming language commonly used for statistical computing and graphics. In this snippet, the function diag() is used to create a diagonal matrix named Om. The rep(sigma_hat_sq, 13) part of the code replicates a value stored in the variable sigma_hat_sq 13 times. This replicated vector is then passed to the diag() function.

The diag() function, when given a vector as input, creates a diagonal matrix where the elements of the vector are placed on the main diagonal, and all off-diagonal elements are zeros. Therefore, the resulting matrix Om is a 13x13 diagonal matrix where each diagonal element is equal to the value of sigma_hat_sq, and all off-diagonal elements are zero. This kind of matrix is often used in statistics and econometrics, particularly in the context of variance-covariance matrices, where sigma_hat_sq might represent an estimate of variance or squared standard deviation for a particular variable or model error term.

Part Ω = Om

Om = diag(rep(sigma_hat_sq, 13))  # sigma_hat_sq is value calculated in (c)
Om

          [,1]     [,2]     [,3]     [,4]     [,5]     [,6]     [,7]     [,8]
 [1,] 0.527016 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
 [2,] 0.000000 0.527016 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
 [3,] 0.000000 0.000000 0.527016 0.000000 0.000000 0.000000 0.000000 0.000000
 [4,] 0.000000 0.000000 0.000000 0.527016 0.000000 0.000000 0.000000 0.000000
 [5,] 0.000000 0.000000 0.000000 0.000000 0.527016 0.000000 0.000000 0.000000
 [6,] 0.000000 0.000000 0.000000 0.000000 0.000000 0.527016 0.000000 0.000000
 [7,] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.527016 0.000000
 [8,] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.527016
 [9,] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[10,] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[11,] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[12,] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[13,] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
          [,9]    [,10]    [,11]    [,12]    [,13]
 [1,] 0.000000 0.000000 0.000000 0.000000 0.000000
 [2,] 0.000000 0.000000 0.000000 0.000000 0.000000
 [3,] 0.000000 0.000000 0.000000 0.000000 0.000000
 [4,] 0.000000 0.000000 0.000000 0.000000 0.000000
 [5,] 0.000000 0.000000 0.000000 0.000000 0.000000
 [6,] 0.000000 0.000000 0.000000 0.000000 0.000000
 [7,] 0.000000 0.000000 0.000000 0.000000 0.000000
 [8,] 0.000000 0.000000 0.000000 0.000000 0.000000
 [9,] 0.527016 0.000000 0.000000 0.000000 0.000000
[10,] 0.000000 0.527016 0.000000 0.000000 0.000000
[11,] 0.000000 0.000000 0.527016 0.000000 0.000000
[12,] 0.000000 0.000000 0.000000 0.527016 0.000000
[13,] 0.000000 0.000000 0.000000 0.000000 0.527016

Finally:

ΣA = Σ − Ω

S_A = ΣA

S_A = S - Om
S_A

             X2          X3           X4          X5          X6          X7
X2   0.47298400  0.09439694  0.211544596 -0.31023514  0.27079823  0.28910112
X3   0.09439694  0.47298400  0.164045470  0.28850040 -0.05457510 -0.33516700
X4   0.21154460  0.16404547  0.472983999  0.44336719  0.28658669  0.12897954
X5  -0.31023514  0.28850040  0.443367187  0.47298400 -0.08333309 -0.32111332
X6   0.27079823 -0.05457510  0.286586691 -0.08333309  0.47298400  0.21440123
X7   0.28910112 -0.33516700  0.128979538 -0.32111332  0.21440123  0.47298400
X8   0.23681493 -0.41100659  0.115077279 -0.35136986  0.19578377  0.86456350
X9  -0.15592947  0.29297713  0.186230446  0.36192172 -0.25629405 -0.44993530
X10  0.13669791 -0.22074619  0.009651935 -0.19732684  0.23644061  0.61241308
X11  0.54636420  0.24898534  0.258887259  0.01873198  0.19995001 -0.05513642
X12 -0.07174720 -0.56129569 -0.074666889 -0.27395522  0.05539820  0.43368134
X13  0.07234319 -0.36871043  0.003911231 -0.27676855  0.06600394  0.69994936
X14  0.64372004 -0.19201056  0.223626264 -0.44059693  0.39335085  0.49811488
            X8         X9          X10         X11         X12          X13
X2   0.2368149 -0.1559295  0.136697912  0.54636420 -0.07174720  0.072343187
X3  -0.4110066  0.2929771 -0.220746187  0.24898534 -0.56129569 -0.368710428
X4   0.1150773  0.1862304  0.009651935  0.25888726 -0.07466689  0.003911231
X5  -0.3513699  0.3619217 -0.197326836  0.01873198 -0.27395522 -0.276768549
X6   0.1957838 -0.2562940  0.236440610  0.19995001  0.05539820  0.066003936
X7   0.8645635 -0.4499353  0.612413084 -0.05513642  0.43368134  0.699949365
X8   0.4729840 -0.5378996  0.652691769 -0.17237940  0.54347857  0.787193902
X9  -0.5378996  0.4729840 -0.365845099  0.13905701 -0.26263963 -0.503269596
X10  0.6526918 -0.3658451  0.472983999 -0.02524993  0.29554425  0.519067096
X11 -0.1723794  0.1390570 -0.025249931  0.47298400 -0.52181319 -0.428814942
X12  0.5434786 -0.2626396  0.295544253 -0.52181319  0.47298400  0.565468293
X13  0.7871939 -0.5032696  0.519067096 -0.42881494  0.56546829  0.472983999
X14  0.4941931 -0.3113852  0.330416700  0.31610011  0.23618345  0.312761075
           X14
X2   0.6437200
X3  -0.1920106
X4   0.2236263
X5  -0.4405969
X6   0.3933508
X7   0.4981149
X8   0.4941931
X9  -0.3113852
X10  0.3304167
X11  0.3161001
X12  0.2361834
X13  0.3127611
X14  0.4729840

(e). In this part you may follow the code chunk provided to see what happens when you use a particular value of $\hat{\sigma}^2$ as calculated in part (c). This code calculates the factor loadings ($A$ in the right hand branch) for the 2-factor principal axis factoring using the value of $\hat{\sigma}^2$ calculated in part (c). You don’t need to do anything apart from commenting on the difference in the results of parts (b) and (e). In particular look at the eigenvalues of $\Sigma_A$ and comment.

Code chunk for (d) and (e)

Notation: We compute A specifically, using the scaled data. Use the right branch of p14. Write S for the covariance matrix of the scaled data, corresponding to $\Sigma$, Om for $\Omega$ and S_A for $\Sigma_A$.

The loadings will be in the matrix A. We set $k=2$.

(Don’t forget to take out the eval = FALSE.)

S = cov(wine0)  # wine0 is the scaled wine data
S

             X2          X3           X4          X5          X6          X7
X2   1.00000000  0.09439694  0.211544596 -0.31023514  0.27079823  0.28910112
X3   0.09439694  1.00000000  0.164045470  0.28850040 -0.05457510 -0.33516700
X4   0.21154460  0.16404547  1.000000000  0.44336719  0.28658669  0.12897954
X5  -0.31023514  0.28850040  0.443367187  1.00000000 -0.08333309 -0.32111332
X6   0.27079823 -0.05457510  0.286586691 -0.08333309  1.00000000  0.21440123
X7   0.28910112 -0.33516700  0.128979538 -0.32111332  0.21440123  1.00000000
X8   0.23681493 -0.41100659  0.115077279 -0.35136986  0.19578377  0.86456350
X9  -0.15592947  0.29297713  0.186230446  0.36192172 -0.25629405 -0.44993530
X10  0.13669791 -0.22074619  0.009651935 -0.19732684  0.23644061  0.61241308
X11  0.54636420  0.24898534  0.258887259  0.01873198  0.19995001 -0.05513642
X12 -0.07174720 -0.56129569 -0.074666889 -0.27395522  0.05539820  0.43368134
X13  0.07234319 -0.36871043  0.003911231 -0.27676855  0.06600394  0.69994936
X14  0.64372004 -0.19201056  0.223626264 -0.44059693  0.39335085  0.49811488
            X8         X9          X10         X11         X12          X13
X2   0.2368149 -0.1559295  0.136697912  0.54636420 -0.07174720  0.072343187
X3  -0.4110066  0.2929771 -0.220746187  0.24898534 -0.56129569 -0.368710428
X4   0.1150773  0.1862304  0.009651935  0.25888726 -0.07466689  0.003911231
X5  -0.3513699  0.3619217 -0.197326836  0.01873198 -0.27395522 -0.276768549
X6   0.1957838 -0.2562940  0.236440610  0.19995001  0.05539820  0.066003936
X7   0.8645635 -0.4499353  0.612413084 -0.05513642  0.43368134  0.699949365
X8   1.0000000 -0.5378996  0.652691769 -0.17237940  0.54347857  0.787193902
X9  -0.5378996  1.0000000 -0.365845099  0.13905701 -0.26263963 -0.503269596
X10  0.6526918 -0.3658451  1.000000000 -0.02524993  0.29554425  0.519067096
X11 -0.1723794  0.1390570 -0.025249931  1.00000000 -0.52181319 -0.428814942
X12  0.5434786 -0.2626396  0.295544253 -0.52181319  1.00000000  0.565468293
X13  0.7871939 -0.5032696  0.519067096 -0.42881494  0.56546829  1.000000000
X14  0.4941931 -0.3113852  0.330416700  0.31610011  0.23618345  0.312761075
           X14
X2   0.6437200
X3  -0.1920106
X4   0.2236263
X5  -0.4405969
X6   0.3933508
X7   0.4981149
X8   0.4941931
X9  -0.3113852
X10  0.3304167
X11  0.3161001
X12  0.2361834
X13  0.3127611
X14  1.0000000

Om = diag(rep(sigma_hat_sq, 13))  # sigma_hat_sq is value calculated in (c)
Om

          [,1]     [,2]     [,3]     [,4]     [,5]     [,6]     [,7]     [,8]
 [1,] 0.527016 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
 [2,] 0.000000 0.527016 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
 [3,] 0.000000 0.000000 0.527016 0.000000 0.000000 0.000000 0.000000 0.000000
 [4,] 0.000000 0.000000 0.000000 0.527016 0.000000 0.000000 0.000000 0.000000
 [5,] 0.000000 0.000000 0.000000 0.000000 0.527016 0.000000 0.000000 0.000000
 [6,] 0.000000 0.000000 0.000000 0.000000 0.000000 0.527016 0.000000 0.000000
 [7,] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.527016 0.000000
 [8,] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.527016
 [9,] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[10,] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[11,] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[12,] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
[13,] 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000
          [,9]    [,10]    [,11]    [,12]    [,13]
 [1,] 0.000000 0.000000 0.000000 0.000000 0.000000
 [2,] 0.000000 0.000000 0.000000 0.000000 0.000000
 [3,] 0.000000 0.000000 0.000000 0.000000 0.000000
 [4,] 0.000000 0.000000 0.000000 0.000000 0.000000
 [5,] 0.000000 0.000000 0.000000 0.000000 0.000000
 [6,] 0.000000 0.000000 0.000000 0.000000 0.000000
 [7,] 0.000000 0.000000 0.000000 0.000000 0.000000
 [8,] 0.000000 0.000000 0.000000 0.000000 0.000000
 [9,] 0.527016 0.000000 0.000000 0.000000 0.000000
[10,] 0.000000 0.527016 0.000000 0.000000 0.000000
[11,] 0.000000 0.000000 0.527016 0.000000 0.000000
[12,] 0.000000 0.000000 0.000000 0.527016 0.000000
[13,] 0.000000 0.000000 0.000000 0.000000 0.527016

S_A = S - Om
S_A

             X2          X3           X4          X5          X6          X7
X2   0.47298400  0.09439694  0.211544596 -0.31023514  0.27079823  0.28910112
X3   0.09439694  0.47298400  0.164045470  0.28850040 -0.05457510 -0.33516700
X4   0.21154460  0.16404547  0.472983999  0.44336719  0.28658669  0.12897954
X5  -0.31023514  0.28850040  0.443367187  0.47298400 -0.08333309 -0.32111332
X6   0.27079823 -0.05457510  0.286586691 -0.08333309  0.47298400  0.21440123
X7   0.28910112 -0.33516700  0.128979538 -0.32111332  0.21440123  0.47298400
X8   0.23681493 -0.41100659  0.115077279 -0.35136986  0.19578377  0.86456350
X9  -0.15592947  0.29297713  0.186230446  0.36192172 -0.25629405 -0.44993530
X10  0.13669791 -0.22074619  0.009651935 -0.19732684  0.23644061  0.61241308
X11  0.54636420  0.24898534  0.258887259  0.01873198  0.19995001 -0.05513642
X12 -0.07174720 -0.56129569 -0.074666889 -0.27395522  0.05539820  0.43368134
X13  0.07234319 -0.36871043  0.003911231 -0.27676855  0.06600394  0.69994936
X14  0.64372004 -0.19201056  0.223626264 -0.44059693  0.39335085  0.49811488
            X8         X9          X10         X11         X12          X13
X2   0.2368149 -0.1559295  0.136697912  0.54636420 -0.07174720  0.072343187
X3  -0.4110066  0.2929771 -0.220746187  0.24898534 -0.56129569 -0.368710428
X4   0.1150773  0.1862304  0.009651935  0.25888726 -0.07466689  0.003911231
X5  -0.3513699  0.3619217 -0.197326836  0.01873198 -0.27395522 -0.276768549
X6   0.1957838 -0.2562940  0.236440610  0.19995001  0.05539820  0.066003936
X7   0.8645635 -0.4499353  0.612413084 -0.05513642  0.43368134  0.699949365
X8   0.4729840 -0.5378996  0.652691769 -0.17237940  0.54347857  0.787193902
X9  -0.5378996  0.4729840 -0.365845099  0.13905701 -0.26263963 -0.503269596
X10  0.6526918 -0.3658451  0.472983999 -0.02524993  0.29554425  0.519067096
X11 -0.1723794  0.1390570 -0.025249931  0.47298400 -0.52181319 -0.428814942
X12  0.5434786 -0.2626396  0.295544253 -0.52181319  0.47298400  0.565468293
X13  0.7871939 -0.5032696  0.519067096 -0.42881494  0.56546829  0.472983999
X14  0.4941931 -0.3113852  0.330416700  0.31610011  0.23618345  0.312761075
           X14
X2   0.6437200
X3  -0.1920106
X4   0.2236263
X5  -0.4405969
X6   0.3933508
X7   0.4981149
X8   0.4941931
X9  -0.3113852
X10  0.3304167
X11  0.3161001
X12  0.2361834
X13  0.3127611
X14  0.4729840

Use eigen on the matrix S_A.

This code performs a Principal Component Analysis (PCA) on a matrix S_A. It calculates the eigenvalues and eigenvectors (eig_A), selects the first two principal components (Gamma_hat_2), and computes the square roots of the first two eigenvalues (Lambda_hat_2). The estimated factor loadings matrix (Ahat) is then calculated. Finally, two biplots are created for visualizing the PCA results: one using the estimated factor loadings and the other using the original factor loadings from matrix A. The biplot function is used for visualization, with different colors assigned to the points for clarity.

# Calculate the eigenvalues and eigenvectors of the matrix S_A
eig_A = eigen(S_A)
# Display the eigenvalues and eigenvectors
eig_A

eigen() decomposition
$values
 [1]  4.17883425  1.96995773  0.91905597  0.39195792  0.32621218  0.11464103
 [7]  0.02401231 -0.17851864 -0.23813606 -0.27611352 -0.30122736 -0.35824577
[13] -0.42363807

$vectors
              [,1]         [,2]        [,3]        [,4]        [,5]        [,6]
 [1,] -0.144329395 -0.483651548 -0.20738262 -0.01785630 -0.26566365  0.21353865
 [2,]  0.245187580 -0.224930935  0.08901289  0.53689028  0.03521363  0.53681385
 [3,]  0.002051061 -0.316068814  0.62622390 -0.21417556 -0.14302547  0.15447466
 [4,]  0.239320405  0.010590502  0.61208035  0.06085941  0.06610294 -0.10082451
 [5,] -0.141992042 -0.299634003  0.13075693 -0.35179658  0.72704851  0.03814394
 [6,] -0.394660845 -0.065039512  0.14617896  0.19806835 -0.14931841 -0.08412230
 [7,] -0.422934297  0.003359812  0.15068190  0.15229479 -0.10902584 -0.01892002
 [8,]  0.298533103 -0.028779488  0.17036816 -0.20330102 -0.50070298 -0.25859401
 [9,] -0.313429488 -0.039301722  0.14945431  0.39905653  0.13685982 -0.53379539
[10,]  0.088616705 -0.529995672 -0.13730621  0.06592568 -0.07643678 -0.41864414
[11,] -0.296714564  0.279235148  0.08522192 -0.42777141 -0.17361452  0.10598274
[12,] -0.376167411  0.164496193  0.16600459  0.18412074 -0.10116099  0.26585107
[13,] -0.286752227 -0.364902832 -0.12674592 -0.23207086 -0.15786880  0.11972557
             [,7]        [,8]        [,9]       [,10]       [,11]       [,12]
 [1,]  0.05639636  0.39613926  0.50861912  0.21160473 -0.22591696 -0.26628645
 [2,] -0.42052391  0.06582674 -0.07528304 -0.30907994  0.07648554  0.12169604
 [3,]  0.14917061 -0.17026002 -0.30769445 -0.02712539 -0.49869142 -0.04962237
 [4,]  0.28696914  0.42797018  0.20044931  0.05279942  0.47931378 -0.05574287
 [5,] -0.32288330 -0.15636143  0.27140257  0.06787022  0.07128891  0.06222011
 [6,]  0.02792498 -0.40593409  0.28603452 -0.32013135  0.30434119 -0.30388245
 [7,]  0.06068521 -0.18724536  0.04957849 -0.16315051 -0.02569409 -0.04289883
 [8,] -0.59544729 -0.23328465  0.19550132  0.21553507  0.11689586  0.04235219
 [9,] -0.37213935  0.36822675 -0.20914487  0.13418390 -0.23736257 -0.09555303
[10,]  0.22771214 -0.03379692  0.05621752 -0.29077518  0.03183880  0.60422163
[11,] -0.23207564  0.43662362  0.08582839 -0.52239889 -0.04821201  0.25921400
[12,]  0.04476370 -0.07810789  0.13722690  0.52370587  0.04642330  0.60095872
[13,] -0.07680450  0.12002267 -0.57578611  0.16211600  0.53926983 -0.07940162
            [,13]
 [1,] -0.01496997
 [2,] -0.02596375
 [3,]  0.14121803
 [4,] -0.09168285
 [5,] -0.05677422
 [6,]  0.46390791
 [7,] -0.83225706
 [8,] -0.11403985
 [9,]  0.11691707
[10,]  0.01199280
[11,]  0.08988884
[12,]  0.15671813
[13,] -0.01444734

# Extract the first two eigenvectors from the eigenvectors matrix These are
# used as the principal components in PCA
Gamma_hat_2 = eig_A$vectors[, 1:2]
# Display the first two principal components
Gamma_hat_2

              [,1]         [,2]
 [1,] -0.144329395 -0.483651548
 [2,]  0.245187580 -0.224930935
 [3,]  0.002051061 -0.316068814
 [4,]  0.239320405  0.010590502
 [5,] -0.141992042 -0.299634003
 [6,] -0.394660845 -0.065039512
 [7,] -0.422934297  0.003359812
 [8,]  0.298533103 -0.028779488
 [9,] -0.313429488 -0.039301722
[10,]  0.088616705 -0.529995672
[11,] -0.296714564  0.279235148
[12,] -0.376167411  0.164496193
[13,] -0.286752227 -0.364902832

# Create a diagonal matrix of square roots of the first two eigenvalues These
# represent the standard deviations of the components
Lambda_hat_2 = diag(eig_A$values[1:2]^(1/2))
# Display the diagonal matrix with square roots of eigenvalues
Lambda_hat_2

        [,1]     [,2]
[1,] 2.04422 0.000000
[2,] 0.00000 1.403552

# Calculate the estimated factor loadings matrix by multiplying the matrix of
# principal components with the diagonal matrix of eigenvalues
Ahat = Gamma_hat_2 %*% Lambda_hat_2
# Display the estimated factor loadings matrix
Ahat

             [,1]        [,2]
 [1,] -0.29504100 -0.67883001
 [2,]  0.50121729 -0.31570222
 [3,]  0.00419282 -0.44361896
 [4,]  0.48922349  0.01486432
 [5,] -0.29026293 -0.42055185
 [6,] -0.80677348 -0.09128633
 [7,] -0.86457063  0.00471567
 [8,]  0.61026726 -0.04039350
 [9,] -0.64071874 -0.05516200
[10,]  0.18115202 -0.74387639
[11,] -0.60654976  0.39192100
[12,] -0.76896884  0.23087893
[13,] -0.58618456 -0.51216004

# Create a biplot of the PCA results using the estimated factor loadings The
# points are colored white and blue for differentiation
biplot(PCA.wine$x, Ahat, col = c("white", "blue"))

# Create another biplot for comparison, using the original data and the first
# two columns of matrix A (presumably the original factor loadings) The points
# are also colored white and blue for differentiation
biplot(PCA.wine$x, A[, 1:2], col = c("white", "blue"))

Question 3 Wine data -scaled

Consider the scaled data of Question 2. We calculate different ML-based 2-factor models which we want to compare. (Hint: use the factanal function.)

(a). Calculate ‘plain’ ML factor loadings, the varimax optimal orthogonal factor loadings and the varimax optimal oblique factor loadings (using promax) and compare the results of the factor loadings and biplots. (You will need scores = "regression" in the factanal function in order to get the scores for the biplots. See below.) Comment on the results.

(b). Carry out a sequence of hypothesis tests for the number of factors in the model, starting with the one-factor model.

(c). For each $k \leq k_{max}$, state the number of degrees of freedom of the chi-square distribution, which represents the approximate distribution of $-2 \log LR_k$, the $p$-value.

(d). What is the appropriate $k$ for these data? Use a significance level of 0.01. And repeat with a significance level of 0.10.

Hint:

fa.02 = factanal( wine0, 2, rotation = "none", scores = "regression" ) # this is the k=2 case, plain option
biplot( fa.02$scores, fa.02$loadings, col = c("white", "blue") )

Question 4 Life expectancy (EH 2011) (Additional question to Inge’s)

The data life show life expectancy in years by country, age, and sex. The data come from Keyfitz and Flieger (1971) and relate to life expectancies in the 1960s.

The notations “m0”, “m25”, “m50”, “m75”, “w0”, “w25”, “w50”, “w75” are for life expectancies of males (m) and females (w) at births, aged 25, 50 and 75 respectively.

life <- structure(.Data = list(c(63, 34, 38, 59, 56, 62, 50, 65, 56, 69, 65, 64,
    56, 60, 61, 49, 59, 63, 59, 65, 65, 64, 64, 67, 61, 68, 67, 65, 59, 58, 57),
    c(51, 29, 30, 42, 38, 44, 39, 44, 46, 47, 48, 50, 44, 44, 45, 40, 42, 44, 44,
        48, 48, 63, 43, 45, 40, 46, 45, 46, 43, 44, 46), c(30, 13, 17, 20, 18, 24,
        20, 22, 24, 24, 26, 28, 25, 22, 22, 22, 22, 23, 24, 28, 26, 21, 21, 23, 21,
        23, 23, 24, 23, 24, 28), c(13, 5, 7, 6, 7, 7, 7, 7, 11, 8, 9, 11, 10, 6,
        8, 9, 6, 8, 8, 14, 9, 7, 6, 8, 10, 8, 8, 9, 10, 9, 9), c(67, 38, 38, 64,
        62, 69, 55, 72, 63, 75, 68, 66, 61, 65, 65, 51, 61, 67, 63, 68, 67, 68, 68,
        74, 67, 75, 74, 71, 66, 62, 60), c(54, 32, 34, 46, 46, 50, 43, 50, 54, 53,
        50, 51, 48, 45, 49, 41, 43, 48, 46, 51, 49, 47, 47, 51, 46, 52, 51, 51, 49,
        47, 49), c(34, 17, 20, 25, 25, 28, 23, 27, 33, 29, 27, 29, 27, 25, 27, 23,
        22, 26, 25, 29, 27, 25, 24, 28, 25, 29, 28, 28, 27, 25, 28), c(15, 6, 7,
        8, 10, 14, 8, 9, 19, 10, 10, 11, 12, 9, 10, 8, 7, 9, 8, 13, 10, 9, 8, 10,
        11, 10, 10, 10, 12, 10, 11)), class = "data.frame", names = c("m0", "m25",
    "m50", "m75", "w0", "w25", "w50", "w75"), row.names = c("Algeria", "Cameroon",
    "Madagascar", "Mauritius", "Reunion", "Seychelles", "South Africa (C)", "South Africa (W)",
    "Tunisia", "Canada", "Costa Rica", "Dominican Rep.", "El Salvador", "Greenland",
    "Grenada", "Guatemala", "Honduras", "Jamaica", "Mexico", "Nicaragua", "Panama",
    "Trinidad (62)", "Trinidad (67)", "United States (66)", "United States (NW66)",
    "United States (W66)", "United States (67)", "Argentina", "Chile", "Colombia",
    "Ecuador"))
head(life)

           m0 m25 m50 m75 w0 w25 w50 w75
Algeria    63  51  30  13 67  54  34  15
Cameroon   34  29  13   5 38  32  17   6
Madagascar 38  30  17   7 38  34  20   7
Mauritius  59  42  20   6 64  46  25   8
Reunion    56  38  18   7 62  46  25  10
Seychelles 62  44  24   7 69  50  28  14

Use an asymptotic test as the formal test for the number of factors $m$ incorporated into the maximum likelihood approach.

sapply(1:3, function(f) factanal(life, factors = f, method = "mle")$PVAL)

   objective    objective    objective 
1.879555e-24 1.911514e-05 4.578204e-01

Given your solution in (a), obtain the rotated loadings.

factanal(life, factors = 3, method = "mle")


Call:
factanal(x = life, factors = 3, method = "mle")

Uniquenesses:
   m0   m25   m50   m75    w0   w25   w50   w75 
0.005 0.362 0.066 0.288 0.005 0.011 0.020 0.146 

Loadings:
    Factor1 Factor2 Factor3
m0  0.964   0.122   0.226  
m25 0.646   0.169   0.438  
m50 0.430   0.354   0.790  
m75         0.525   0.656  
w0  0.970   0.217          
w25 0.764   0.556   0.310  
w50 0.536   0.729   0.401  
w75 0.156   0.867   0.280  

               Factor1 Factor2 Factor3
SS loadings      3.375   2.082   1.640
Proportion Var   0.422   0.260   0.205
Cumulative Var   0.422   0.682   0.887

Test of the hypothesis that 3 factors are sufficient.
The chi square statistic is 6.73 on 7 degrees of freedom.
The p-value is 0.458

Interpret the factor solutions in (b)..

According to EH(2011), by examining the estimated factor loadings, we can conclude that:

The first factor is dominated by life expectancy at birth for both males (0.964) and females (0.970); perhaps this factor could be labelled life force at birth.
The second reflects life expectancies at older ages, might be labelled as life force amongst the elderly.
The third factor from the varimax rotation has its highest loadings for the life expectancies of men aged 50 and 75. This might be labelled as life force for elderly men.

Obtain the estimated factor scores for each country. Plot the factors.

scores <- factanal(life, factors = 3, method = "mle", scores = "regression")$scores
scores

                          Factor1     Factor2     Factor3
Algeria              -0.258062561  1.90095771  1.91581631
Cameroon             -2.782495791 -0.72340014 -1.84772224
Madagascar           -2.806428187 -0.81158820 -0.01210318
Mauritius             0.141004934 -0.29028454 -0.85862443
Reunion              -0.196352142  0.47429917 -1.55046466
Seychelles            0.367371307  0.82902375 -0.55214085
South Africa (C)     -1.028567629 -0.08065792 -0.65421971
South Africa (W)      0.946193522  0.06400408 -0.91995289
Tunisia              -0.862493550  3.59177195 -0.36442148
Canada                1.245304248  0.29564122 -0.27342781
Costa Rica            0.508736247 -0.50500435  1.01328707
Dominican Rep.        0.106044085  0.01111171  1.83871599
El Salvador          -0.608155779  0.65100820  0.48836431
Greenland             0.235114220 -0.69123901 -0.38558654
Grenada               0.132008172  0.25241049 -0.15220645
Guatemala            -1.450336359 -0.67765804  0.65911906
Honduras              0.043253249 -1.85175707  0.30633182
Jamaica               0.462124701 -0.51918493  0.08032855
Mexico               -0.052332675 -0.72020002  0.44417800
Nicaragua             0.268974443  0.08407227  1.70568388
Panama                0.442333434 -0.73778272  1.25218728
Trinidad (62)         0.711367053 -0.95989475 -0.21545329
Trinidad (67)         0.787286051 -1.10729029 -0.51958263
United States (66)    1.128331259  0.16389896 -0.68177046
United States (NW66)  0.400058903 -0.36230253 -0.74299137
United States (W66)   1.214345385  0.40877239 -0.69225320
United States (67)    1.128331259  0.16389896 -0.68177046
Argentina             0.731344988  0.24811968 -0.12817725
Chile                 0.009751528  0.75222637 -0.49198911
Colombia             -0.240602517 -0.29543613  0.42919600
Ecuador              -0.723451797  0.44246371  1.59164974

cex <- 0.5
plot(scores[, 1], scores[, 2], type = "n", xlab = "Factor 1", ylab = "Factor 2")
text(scores[, 1], scores[, 2], abbreviate(rownames(life), 5), cex = cex)

plot(scores[, 1], scores[, 3], type = "n", xlab = "Factor 1", ylab = "Factor 3")
text(scores[, 1], scores[, 3], abbreviate(rownames(life), 5), cex = cex)

plot(scores[, 2], scores[, 3], type = "n", xlab = "Factor 2", ylab = "Factor 3")
text(scores[, 2], scores[, 3], abbreviate(rownames(life), 5), cex = cex)

Interpretations:

- Ordering along the first axis reflects life force at birth ranging from Cameroon and Madagascar to countries such as the USA. 

- On the third axis Algeria is prominent because it has high life expectancy amongst men at higher ages, with Cameroon at the lower end of the scale with a low life expectancy for men over 50.

Question 5 Harman datasets (R package “psych”) (Additional question to Inge’s)

In this question, you can use the R package “psych” to apply PCFA and PC Axis solutions.

Five classic data sets reported by Harman (1967) are 9 psychological (cognitive) variables taken from Holzinger and 8 emotional variables taken from Burt. Two others are socioeconomic and political data sets. Additionally, 8 physical variables. Total 27 variables.

All five of these are used for tests and demonstrations of various factoring algorithms.

The data Harman74.cor is a correlation matrix of 24 psychological tests given to 145 seventh and eight-grade children in a Chicago suburb by Holzinger and Swineford. We use this dataset using the package to compare the following methods.

library(psych)
data(Harman)
# View(Harman)
Harman74.cor

$cov
                       VisualPerception Cubes PaperFormBoard Flags
VisualPerception                  1.000 0.318          0.403 0.468
Cubes                             0.318 1.000          0.317 0.230
PaperFormBoard                    0.403 0.317          1.000 0.305
Flags                             0.468 0.230          0.305 1.000
GeneralInformation                0.321 0.285          0.247 0.227
PargraphComprehension             0.335 0.234          0.268 0.327
SentenceCompletion                0.304 0.157          0.223 0.335
WordClassification                0.332 0.157          0.382 0.391
WordMeaning                       0.326 0.195          0.184 0.325
Addition                          0.116 0.057         -0.075 0.099
Code                              0.308 0.150          0.091 0.110
CountingDots                      0.314 0.145          0.140 0.160
StraightCurvedCapitals            0.489 0.239          0.321 0.327
WordRecognition                   0.125 0.103          0.177 0.066
NumberRecognition                 0.238 0.131          0.065 0.127
FigureRecognition                 0.414 0.272          0.263 0.322
ObjectNumber                      0.176 0.005          0.177 0.187
NumberFigure                      0.368 0.255          0.211 0.251
FigureWord                        0.270 0.112          0.312 0.137
Deduction                         0.365 0.292          0.297 0.339
NumericalPuzzles                  0.369 0.306          0.165 0.349
ProblemReasoning                  0.413 0.232          0.250 0.380
SeriesCompletion                  0.474 0.348          0.383 0.335
ArithmeticProblems                0.282 0.211          0.203 0.248
                       GeneralInformation PargraphComprehension
VisualPerception                    0.321                 0.335
Cubes                               0.285                 0.234
PaperFormBoard                      0.247                 0.268
Flags                               0.227                 0.327
GeneralInformation                  1.000                 0.622
PargraphComprehension               0.622                 1.000
SentenceCompletion                  0.656                 0.722
WordClassification                  0.578                 0.527
WordMeaning                         0.723                 0.714
Addition                            0.311                 0.203
Code                                0.344                 0.353
CountingDots                        0.215                 0.095
StraightCurvedCapitals              0.344                 0.309
WordRecognition                     0.280                 0.292
NumberRecognition                   0.229                 0.251
FigureRecognition                   0.187                 0.291
ObjectNumber                        0.208                 0.273
NumberFigure                        0.263                 0.167
FigureWord                          0.190                 0.251
Deduction                           0.398                 0.435
NumericalPuzzles                    0.318                 0.263
ProblemReasoning                    0.441                 0.386
SeriesCompletion                    0.435                 0.431
ArithmeticProblems                  0.420                 0.433
                       SentenceCompletion WordClassification WordMeaning
VisualPerception                    0.304              0.332       0.326
Cubes                               0.157              0.157       0.195
PaperFormBoard                      0.223              0.382       0.184
Flags                               0.335              0.391       0.325
GeneralInformation                  0.656              0.578       0.723
PargraphComprehension               0.722              0.527       0.714
SentenceCompletion                  1.000              0.619       0.685
WordClassification                  0.619              1.000       0.532
WordMeaning                         0.685              0.532       1.000
Addition                            0.246              0.285       0.170
Code                                0.232              0.300       0.280
CountingDots                        0.181              0.271       0.113
StraightCurvedCapitals              0.345              0.395       0.280
WordRecognition                     0.236              0.252       0.260
NumberRecognition                   0.172              0.175       0.248
FigureRecognition                   0.180              0.296       0.242
ObjectNumber                        0.228              0.255       0.274
NumberFigure                        0.159              0.250       0.208
FigureWord                          0.226              0.274       0.274
Deduction                           0.451              0.427       0.446
NumericalPuzzles                    0.314              0.362       0.266
ProblemReasoning                    0.396              0.357       0.483
SeriesCompletion                    0.405              0.501       0.504
ArithmeticProblems                  0.437              0.388       0.424
                       Addition  Code CountingDots StraightCurvedCapitals
VisualPerception          0.116 0.308        0.314                  0.489
Cubes                     0.057 0.150        0.145                  0.239
PaperFormBoard           -0.075 0.091        0.140                  0.321
Flags                     0.099 0.110        0.160                  0.327
GeneralInformation        0.311 0.344        0.215                  0.344
PargraphComprehension     0.203 0.353        0.095                  0.309
SentenceCompletion        0.246 0.232        0.181                  0.345
WordClassification        0.285 0.300        0.271                  0.395
WordMeaning               0.170 0.280        0.113                  0.280
Addition                  1.000 0.484        0.585                  0.408
Code                      0.484 1.000        0.428                  0.535
CountingDots              0.585 0.428        1.000                  0.512
StraightCurvedCapitals    0.408 0.535        0.512                  1.000
WordRecognition           0.172 0.350        0.131                  0.195
NumberRecognition         0.154 0.240        0.173                  0.139
FigureRecognition         0.124 0.314        0.119                  0.281
ObjectNumber              0.289 0.362        0.278                  0.194
NumberFigure              0.317 0.350        0.349                  0.323
FigureWord                0.190 0.290        0.110                  0.263
Deduction                 0.173 0.202        0.246                  0.241
NumericalPuzzles          0.405 0.399        0.355                  0.425
ProblemReasoning          0.160 0.304        0.193                  0.279
SeriesCompletion          0.262 0.251        0.350                  0.382
ArithmeticProblems        0.531 0.412        0.414                  0.358
                       WordRecognition NumberRecognition FigureRecognition
VisualPerception                 0.125             0.238             0.414
Cubes                            0.103             0.131             0.272
PaperFormBoard                   0.177             0.065             0.263
Flags                            0.066             0.127             0.322
GeneralInformation               0.280             0.229             0.187
PargraphComprehension            0.292             0.251             0.291
SentenceCompletion               0.236             0.172             0.180
WordClassification               0.252             0.175             0.296
WordMeaning                      0.260             0.248             0.242
Addition                         0.172             0.154             0.124
Code                             0.350             0.240             0.314
CountingDots                     0.131             0.173             0.119
StraightCurvedCapitals           0.195             0.139             0.281
WordRecognition                  1.000             0.370             0.412
NumberRecognition                0.370             1.000             0.325
FigureRecognition                0.412             0.325             1.000
ObjectNumber                     0.341             0.345             0.324
NumberFigure                     0.201             0.334             0.344
FigureWord                       0.206             0.192             0.258
Deduction                        0.302             0.272             0.388
NumericalPuzzles                 0.183             0.232             0.348
ProblemReasoning                 0.243             0.246             0.283
SeriesCompletion                 0.242             0.256             0.360
ArithmeticProblems               0.304             0.165             0.262
                       ObjectNumber NumberFigure FigureWord Deduction
VisualPerception              0.176        0.368      0.270     0.365
Cubes                         0.005        0.255      0.112     0.292
PaperFormBoard                0.177        0.211      0.312     0.297
Flags                         0.187        0.251      0.137     0.339
GeneralInformation            0.208        0.263      0.190     0.398
PargraphComprehension         0.273        0.167      0.251     0.435
SentenceCompletion            0.228        0.159      0.226     0.451
WordClassification            0.255        0.250      0.274     0.427
WordMeaning                   0.274        0.208      0.274     0.446
Addition                      0.289        0.317      0.190     0.173
Code                          0.362        0.350      0.290     0.202
CountingDots                  0.278        0.349      0.110     0.246
StraightCurvedCapitals        0.194        0.323      0.263     0.241
WordRecognition               0.341        0.201      0.206     0.302
NumberRecognition             0.345        0.334      0.192     0.272
FigureRecognition             0.324        0.344      0.258     0.388
ObjectNumber                  1.000        0.448      0.324     0.262
NumberFigure                  0.448        1.000      0.358     0.301
FigureWord                    0.324        0.358      1.000     0.167
Deduction                     0.262        0.301      0.167     1.000
NumericalPuzzles              0.173        0.357      0.331     0.413
ProblemReasoning              0.273        0.317      0.342     0.463
SeriesCompletion              0.287        0.272      0.303     0.509
ArithmeticProblems            0.326        0.405      0.374     0.366
                       NumericalPuzzles ProblemReasoning SeriesCompletion
VisualPerception                  0.369            0.413            0.474
Cubes                             0.306            0.232            0.348
PaperFormBoard                    0.165            0.250            0.383
Flags                             0.349            0.380            0.335
GeneralInformation                0.318            0.441            0.435
PargraphComprehension             0.263            0.386            0.431
SentenceCompletion                0.314            0.396            0.405
WordClassification                0.362            0.357            0.501
WordMeaning                       0.266            0.483            0.504
Addition                          0.405            0.160            0.262
Code                              0.399            0.304            0.251
CountingDots                      0.355            0.193            0.350
StraightCurvedCapitals            0.425            0.279            0.382
WordRecognition                   0.183            0.243            0.242
NumberRecognition                 0.232            0.246            0.256
FigureRecognition                 0.348            0.283            0.360
ObjectNumber                      0.173            0.273            0.287
NumberFigure                      0.357            0.317            0.272
FigureWord                        0.331            0.342            0.303
Deduction                         0.413            0.463            0.509
NumericalPuzzles                  1.000            0.374            0.451
ProblemReasoning                  0.374            1.000            0.503
SeriesCompletion                  0.451            0.503            1.000
ArithmeticProblems                0.448            0.375            0.434
                       ArithmeticProblems
VisualPerception                    0.282
Cubes                               0.211
PaperFormBoard                      0.203
Flags                               0.248
GeneralInformation                  0.420
PargraphComprehension               0.433
SentenceCompletion                  0.437
WordClassification                  0.388
WordMeaning                         0.424
Addition                            0.531
Code                                0.412
CountingDots                        0.414
StraightCurvedCapitals              0.358
WordRecognition                     0.304
NumberRecognition                   0.165
FigureRecognition                   0.262
ObjectNumber                        0.326
NumberFigure                        0.405
FigureWord                          0.374
Deduction                           0.366
NumericalPuzzles                    0.448
ProblemReasoning                    0.375
SeriesCompletion                    0.434
ArithmeticProblems                  1.000

$center
 [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

$n.obs
[1] 145

(a). Apply a principal factor analysis (FA) with a principal components (PCA) solution using the varimax rotation.

pc <- principal(Harman74.cor$cov, 4, rotate = "varimax")  #principal components
pc

Principal Components Analysis
Call: principal(r = Harman74.cor$cov, nfactors = 4, rotate = "varimax")
Standardized loadings (pattern matrix) based upon correlation matrix
                         RC1   RC3   RC2  RC4   h2   u2 com
VisualPerception        0.16  0.71  0.23 0.14 0.60 0.40 1.4
Cubes                   0.09  0.59  0.08 0.03 0.37 0.63 1.1
PaperFormBoard          0.14  0.66 -0.04 0.11 0.47 0.53 1.2
Flags                   0.25  0.62  0.09 0.03 0.45 0.55 1.4
GeneralInformation      0.79  0.15  0.22 0.11 0.70 0.30 1.3
PargraphComprehension   0.81  0.18  0.07 0.21 0.73 0.27 1.2
SentenceCompletion      0.85  0.15  0.15 0.06 0.77 0.23 1.1
WordClassification      0.64  0.31  0.24 0.11 0.57 0.43 1.8
WordMeaning             0.84  0.16  0.06 0.19 0.78 0.22 1.2
Addition                0.18 -0.13  0.83 0.12 0.76 0.24 1.2
Code                    0.18  0.05  0.63 0.37 0.57 0.43 1.8
CountingDots            0.02  0.17  0.80 0.05 0.67 0.33 1.1
StraightCurvedCapitals  0.18  0.41  0.62 0.03 0.59 0.41 1.9
WordRecognition         0.23 -0.01  0.06 0.68 0.52 0.48 1.2
NumberRecognition       0.12  0.08  0.05 0.67 0.48 0.52 1.1
FigureRecognition       0.06  0.46  0.05 0.58 0.55 0.45 1.9
ObjectNumber            0.14  0.01  0.24 0.68 0.54 0.46 1.4
NumberFigure           -0.02  0.32  0.40 0.50 0.51 0.49 2.7
FigureWord              0.14  0.25  0.20 0.42 0.30 0.70 2.4
Deduction               0.43  0.43  0.09 0.30 0.47 0.53 2.8
NumericalPuzzles        0.18  0.42  0.50 0.17 0.49 0.51 2.5
ProblemReasoning        0.42  0.41  0.13 0.29 0.45 0.55 3.0
SeriesCompletion        0.42  0.52  0.25 0.20 0.55 0.45 2.7
ArithmeticProblems      0.40  0.14  0.55 0.26 0.55 0.45 2.5

                       RC1  RC3  RC2  RC4
SS loadings           4.16 3.31 3.22 2.74
Proportion Var        0.17 0.14 0.13 0.11
Cumulative Var        0.17 0.31 0.45 0.56
Proportion Explained  0.31 0.25 0.24 0.20
Cumulative Proportion 0.31 0.56 0.80 1.00

Mean item complexity =  1.7
Test of the hypothesis that 4 components are sufficient.

The root mean square of the residuals (RMSR) is  0.06 

Fit based upon off diagonal values = 0.97

pa <- fa(Harman74.cor$cov, 4, fm = "pa", rotate = "varimax")  #principal axis
pa

Factor Analysis using method =  pa
Call: fa(r = Harman74.cor$cov, nfactors = 4, rotate = "varimax", fm = "pa")
Standardized loadings (pattern matrix) based upon correlation matrix
                        PA1   PA3   PA2  PA4   h2   u2 com
VisualPerception       0.15  0.68  0.20 0.15 0.55 0.45 1.4
Cubes                  0.11  0.45  0.08 0.08 0.23 0.77 1.3
PaperFormBoard         0.15  0.55 -0.01 0.11 0.34 0.66 1.2
Flags                  0.23  0.53  0.09 0.07 0.35 0.65 1.5
GeneralInformation     0.73  0.19  0.22 0.14 0.64 0.36 1.4
PargraphComprehension  0.76  0.21  0.07 0.23 0.68 0.32 1.4
SentenceCompletion     0.81  0.19  0.15 0.07 0.73 0.27 1.2
WordClassification     0.57  0.34  0.23 0.14 0.51 0.49 2.2
WordMeaning            0.81  0.20  0.05 0.22 0.74 0.26 1.3
Addition               0.17 -0.10  0.82 0.16 0.74 0.26 1.2
Code                   0.18  0.10  0.54 0.37 0.47 0.53 2.1
CountingDots           0.02  0.20  0.71 0.09 0.55 0.45 1.2
StraightCurvedCapitals 0.18  0.42  0.54 0.08 0.51 0.49 2.2
WordRecognition        0.21  0.05  0.08 0.56 0.36 0.64 1.3
NumberRecognition      0.12  0.12  0.08 0.52 0.31 0.69 1.3
FigureRecognition      0.07  0.42  0.06 0.52 0.45 0.55 2.0
ObjectNumber           0.14  0.06  0.22 0.58 0.41 0.59 1.4
NumberFigure           0.02  0.31  0.34 0.45 0.41 0.59 2.7
FigureWord             0.15  0.25  0.18 0.35 0.23 0.77 2.8
Deduction              0.38  0.42  0.10 0.29 0.42 0.58 2.9
NumericalPuzzles       0.18  0.40  0.43 0.21 0.42 0.58 2.8
ProblemReasoning       0.37  0.41  0.13 0.29 0.40 0.60 3.0
SeriesCompletion       0.37  0.52  0.23 0.22 0.51 0.49 2.7
ArithmeticProblems     0.36  0.19  0.49 0.29 0.49 0.51 2.9

                       PA1  PA3  PA2  PA4
SS loadings           3.64 2.93 2.67 2.23
Proportion Var        0.15 0.12 0.11 0.09
Cumulative Var        0.15 0.27 0.38 0.48
Proportion Explained  0.32 0.26 0.23 0.19
Cumulative Proportion 0.32 0.57 0.81 1.00

Mean item complexity =  1.9
Test of the hypothesis that 4 factors are sufficient.

df null model =  276  with the objective function =  11.44
df of  the model are 186  and the objective function was  1.72 

The root mean square of the residuals (RMSR) is  0.04 
The df corrected root mean square of the residuals is  0.05 

Fit based upon off diagonal values = 0.98
Measures of factor score adequacy             
                                                   PA1  PA3  PA2  PA4
Correlation of (regression) scores with factors   0.93 0.87 0.91 0.82
Multiple R square of scores with factors          0.87 0.76 0.82 0.68
Minimum correlation of possible factor scores     0.74 0.52 0.65 0.36

pa$fit

[1] 0.9035181

(b). To apply unweighted least squares (UWLS) and weighted least squares (WLS) methods using the varimax rotation.

# library(psych)
uls <- fa(Harman74.cor$cov, 4, rotate = "varimax")  #unweighted least squares is minres
wls <- fa(Harman74.cor$cov, 4, fm = "wls", rotate = "varimax")  #weighted least squares
print(uls, sort = TRUE)

Factor Analysis using method =  minres
Call: fa(r = Harman74.cor$cov, nfactors = 4, rotate = "varimax")
Standardized loadings (pattern matrix) based upon correlation matrix
                       item  MR1   MR3   MR2  MR4   h2   u2 com
SentenceCompletion        7 0.81  0.19  0.15 0.07 0.73 0.27 1.2
WordMeaning               9 0.81  0.20  0.05 0.22 0.74 0.26 1.3
PargraphComprehension     6 0.76  0.21  0.07 0.23 0.68 0.32 1.4
GeneralInformation        5 0.73  0.19  0.22 0.14 0.64 0.36 1.4
WordClassification        8 0.57  0.34  0.23 0.14 0.51 0.49 2.2
VisualPerception          1 0.15  0.68  0.20 0.15 0.55 0.45 1.4
PaperFormBoard            3 0.15  0.55 -0.01 0.11 0.34 0.66 1.2
Flags                     4 0.23  0.53  0.09 0.07 0.35 0.65 1.5
SeriesCompletion         23 0.37  0.52  0.23 0.22 0.51 0.49 2.7
Cubes                     2 0.11  0.45  0.08 0.08 0.23 0.77 1.3
Deduction                20 0.38  0.42  0.10 0.29 0.42 0.58 2.9
ProblemReasoning         22 0.37  0.41  0.13 0.29 0.40 0.60 3.0
Addition                 10 0.17 -0.11  0.82 0.16 0.74 0.26 1.2
CountingDots             12 0.02  0.20  0.71 0.09 0.55 0.45 1.2
StraightCurvedCapitals   13 0.18  0.42  0.54 0.08 0.51 0.49 2.2
Code                     11 0.18  0.11  0.54 0.37 0.47 0.53 2.1
ArithmeticProblems       24 0.36  0.19  0.49 0.29 0.49 0.51 2.9
NumericalPuzzles         21 0.18  0.40  0.43 0.21 0.42 0.58 2.8
ObjectNumber             17 0.14  0.06  0.22 0.58 0.41 0.59 1.4
WordRecognition          14 0.21  0.05  0.08 0.56 0.36 0.64 1.3
NumberRecognition        15 0.12  0.12  0.08 0.52 0.31 0.69 1.3
FigureRecognition        16 0.07  0.42  0.06 0.52 0.45 0.55 2.0
NumberFigure             18 0.02  0.31  0.34 0.45 0.41 0.59 2.7
FigureWord               19 0.15  0.25  0.18 0.35 0.23 0.77 2.8

                       MR1  MR3  MR2  MR4
SS loadings           3.64 2.93 2.67 2.23
Proportion Var        0.15 0.12 0.11 0.09
Cumulative Var        0.15 0.27 0.38 0.48
Proportion Explained  0.32 0.26 0.23 0.19
Cumulative Proportion 0.32 0.57 0.81 1.00

Mean item complexity =  1.9
Test of the hypothesis that 4 factors are sufficient.

df null model =  276  with the objective function =  11.44
df of  the model are 186  and the objective function was  1.72 

The root mean square of the residuals (RMSR) is  0.04 
The df corrected root mean square of the residuals is  0.05 

Fit based upon off diagonal values = 0.98
Measures of factor score adequacy             
                                                   MR1  MR3  MR2  MR4
Correlation of (regression) scores with factors   0.93 0.87 0.91 0.82
Multiple R square of scores with factors          0.87 0.76 0.83 0.68
Minimum correlation of possible factor scores     0.74 0.52 0.65 0.36

(c). To apply MLE method for FA using the varimax rotation.

mle <- factanal(covmat = Harman74.cor$cov, factors = 4)  ## using 'psych' package
mle


Call:
factanal(factors = 4, covmat = Harman74.cor$cov)

Uniquenesses:
      VisualPerception                  Cubes         PaperFormBoard 
                 0.438                  0.780                  0.644 
                 Flags     GeneralInformation  PargraphComprehension 
                 0.651                  0.352                  0.312 
    SentenceCompletion     WordClassification            WordMeaning 
                 0.283                  0.485                  0.257 
              Addition                   Code           CountingDots 
                 0.240                  0.551                  0.435 
StraightCurvedCapitals        WordRecognition      NumberRecognition 
                 0.491                  0.646                  0.696 
     FigureRecognition           ObjectNumber           NumberFigure 
                 0.549                  0.598                  0.593 
            FigureWord              Deduction       NumericalPuzzles 
                 0.762                  0.592                  0.583 
      ProblemReasoning       SeriesCompletion     ArithmeticProblems 
                 0.601                  0.497                  0.500 

Loadings:
                       Factor1 Factor2 Factor3 Factor4
VisualPerception        0.160   0.689   0.187   0.160 
Cubes                   0.117   0.436                 
PaperFormBoard          0.137   0.570           0.110 
Flags                   0.233   0.527                 
GeneralInformation      0.739   0.185   0.213   0.150 
PargraphComprehension   0.767   0.205           0.233 
SentenceCompletion      0.806   0.197   0.153         
WordClassification      0.569   0.339   0.242   0.132 
WordMeaning             0.806   0.201           0.227 
Addition                0.167  -0.118   0.831   0.166 
Code                    0.180   0.120   0.512   0.374 
CountingDots                    0.210   0.716         
StraightCurvedCapitals  0.188   0.438   0.525         
WordRecognition         0.197                   0.553 
NumberRecognition       0.122   0.116           0.520 
FigureRecognition               0.408           0.525 
ObjectNumber            0.142           0.219   0.574 
NumberFigure                    0.293   0.336   0.456 
FigureWord              0.148   0.239   0.161   0.365 
Deduction               0.378   0.402   0.118   0.301 
NumericalPuzzles        0.175   0.381   0.438   0.223 
ProblemReasoning        0.366   0.399   0.123   0.301 
SeriesCompletion        0.369   0.500   0.244   0.239 
ArithmeticProblems      0.370   0.158   0.496   0.304 

               Factor1 Factor2 Factor3 Factor4
SS loadings      3.647   2.872   2.657   2.290
Proportion Var   0.152   0.120   0.111   0.095
Cumulative Var   0.152   0.272   0.382   0.478

The degrees of freedom for the model is 186 and the fit was 1.7108

mle1 <- factanal(factors = 4, covmat = Harman74.cor$cov)  ## factanal basic package 
mle1


Call:
factanal(factors = 4, covmat = Harman74.cor$cov)

Uniquenesses:
      VisualPerception                  Cubes         PaperFormBoard 
                 0.438                  0.780                  0.644 
                 Flags     GeneralInformation  PargraphComprehension 
                 0.651                  0.352                  0.312 
    SentenceCompletion     WordClassification            WordMeaning 
                 0.283                  0.485                  0.257 
              Addition                   Code           CountingDots 
                 0.240                  0.551                  0.435 
StraightCurvedCapitals        WordRecognition      NumberRecognition 
                 0.491                  0.646                  0.696 
     FigureRecognition           ObjectNumber           NumberFigure 
                 0.549                  0.598                  0.593 
            FigureWord              Deduction       NumericalPuzzles 
                 0.762                  0.592                  0.583 
      ProblemReasoning       SeriesCompletion     ArithmeticProblems 
                 0.601                  0.497                  0.500 

Loadings:
                       Factor1 Factor2 Factor3 Factor4
VisualPerception        0.160   0.689   0.187   0.160 
Cubes                   0.117   0.436                 
PaperFormBoard          0.137   0.570           0.110 
Flags                   0.233   0.527                 
GeneralInformation      0.739   0.185   0.213   0.150 
PargraphComprehension   0.767   0.205           0.233 
SentenceCompletion      0.806   0.197   0.153         
WordClassification      0.569   0.339   0.242   0.132 
WordMeaning             0.806   0.201           0.227 
Addition                0.167  -0.118   0.831   0.166 
Code                    0.180   0.120   0.512   0.374 
CountingDots                    0.210   0.716         
StraightCurvedCapitals  0.188   0.438   0.525         
WordRecognition         0.197                   0.553 
NumberRecognition       0.122   0.116           0.520 
FigureRecognition               0.408           0.525 
ObjectNumber            0.142           0.219   0.574 
NumberFigure                    0.293   0.336   0.456 
FigureWord              0.148   0.239   0.161   0.365 
Deduction               0.378   0.402   0.118   0.301 
NumericalPuzzles        0.175   0.381   0.438   0.223 
ProblemReasoning        0.366   0.399   0.123   0.301 
SeriesCompletion        0.369   0.500   0.244   0.239 
ArithmeticProblems      0.370   0.158   0.496   0.304 

               Factor1 Factor2 Factor3 Factor4
SS loadings      3.647   2.872   2.657   2.290
Proportion Var   0.152   0.120   0.111   0.095
Cumulative Var   0.152   0.272   0.382   0.478

The degrees of freedom for the model is 186 and the fit was 1.7108

(d). Compare the unrotated factor, ml, uls, and wls solutions.

# compare with a maximum likelihood solution using factanal
factor.congruence(list(mle, pa, pc, uls, wls))

        Factor1 Factor2 Factor3 Factor4  PA1  PA3  PA2  PA4  RC1  RC3  RC2  RC4
Factor1    1.00    0.61    0.46    0.56 1.00 0.61 0.46 0.55 1.00 0.54 0.44 0.47
Factor2    0.61    1.00    0.50    0.61 0.61 1.00 0.50 0.60 0.60 0.99 0.49 0.52
Factor3    0.46    0.50    1.00    0.57 0.46 0.50 1.00 0.56 0.45 0.44 1.00 0.48
Factor4    0.56    0.61    0.57    1.00 0.56 0.62 0.58 1.00 0.55 0.55 0.56 0.99
PA1        1.00    0.61    0.46    0.56 1.00 0.61 0.46 0.55 1.00 0.54 0.44 0.47
PA3        0.61    1.00    0.50    0.62 0.61 1.00 0.50 0.61 0.61 0.99 0.50 0.53
PA2        0.46    0.50    1.00    0.58 0.46 0.50 1.00 0.57 0.46 0.44 1.00 0.49
PA4        0.55    0.60    0.56    1.00 0.55 0.61 0.57 1.00 0.54 0.54 0.55 0.99
RC1        1.00    0.60    0.45    0.55 1.00 0.61 0.46 0.54 1.00 0.53 0.43 0.46
RC3        0.54    0.99    0.44    0.55 0.54 0.99 0.44 0.54 0.53 1.00 0.43 0.47
RC2        0.44    0.49    1.00    0.56 0.44 0.50 1.00 0.55 0.43 0.43 1.00 0.47
RC4        0.47    0.52    0.48    0.99 0.47 0.53 0.49 0.99 0.46 0.47 0.47 1.00
MR1        1.00    0.61    0.46    0.56 1.00 0.61 0.46 0.55 1.00 0.54 0.44 0.47
MR3        0.61    1.00    0.50    0.62 0.61 1.00 0.50 0.61 0.61 0.99 0.50 0.53
MR2        0.46    0.50    1.00    0.58 0.46 0.50 1.00 0.57 0.46 0.44 1.00 0.49
MR4        0.55    0.60    0.56    1.00 0.55 0.61 0.57 1.00 0.54 0.54 0.55 0.99
WLS1       1.00    0.61    0.46    0.56 1.00 0.61 0.46 0.55 1.00 0.54 0.44 0.47
WLS3       0.61    1.00    0.50    0.62 0.61 1.00 0.50 0.61 0.60 0.99 0.50 0.53
WLS2       0.46    0.50    1.00    0.58 0.46 0.50 1.00 0.57 0.45 0.44 1.00 0.49
WLS4       0.55    0.60    0.57    1.00 0.55 0.61 0.57 1.00 0.54 0.54 0.56 0.99
         MR1  MR3  MR2  MR4 WLS1 WLS3 WLS2 WLS4
Factor1 1.00 0.61 0.46 0.55 1.00 0.61 0.46 0.55
Factor2 0.61 1.00 0.50 0.60 0.61 1.00 0.50 0.60
Factor3 0.46 0.50 1.00 0.56 0.46 0.50 1.00 0.57
Factor4 0.56 0.62 0.58 1.00 0.56 0.62 0.58 1.00
PA1     1.00 0.61 0.46 0.55 1.00 0.61 0.46 0.55
PA3     0.61 1.00 0.50 0.61 0.61 1.00 0.50 0.61
PA2     0.46 0.50 1.00 0.57 0.46 0.50 1.00 0.57
PA4     0.55 0.61 0.57 1.00 0.55 0.61 0.57 1.00
RC1     1.00 0.61 0.46 0.54 1.00 0.60 0.45 0.54
RC3     0.54 0.99 0.44 0.54 0.54 0.99 0.44 0.54
RC2     0.44 0.50 1.00 0.55 0.44 0.50 1.00 0.56
RC4     0.47 0.53 0.49 0.99 0.47 0.53 0.49 0.99
MR1     1.00 0.61 0.46 0.55 1.00 0.61 0.46 0.55
MR3     0.61 1.00 0.50 0.61 0.61 1.00 0.50 0.61
MR2     0.46 0.50 1.00 0.57 0.46 0.50 1.00 0.57
MR4     0.55 0.61 0.57 1.00 0.55 0.61 0.57 1.00
WLS1    1.00 0.61 0.46 0.55 1.00 0.61 0.46 0.55
WLS3    0.61 1.00 0.50 0.61 0.61 1.00 0.50 0.61
WLS2    0.46 0.50 1.00 0.57 0.46 0.50 1.00 0.57
WLS4    0.55 0.61 0.57 1.00 0.55 0.61 0.57 1.00

Note the similarity of the mle and min res (UWLS) solutions.
Note that the order of factors and the sign of some of factors may differ.

STAT3064/STAT5061 Statistical Learning/Statistical Data Science

Computer Lab Week 5 2023

Things you may need to know/do.

Question 1 Aeroplane data

Question 2 Wine data

Code chunk for (d) and (e)

Question 3 Wine data -scaled

Question 4 Life expectancy (EH 2011) (Additional question to Inge’s)

Question 5 Harman datasets (R package “psych”) (Additional question to Inge’s)