#8.2 (in (b), use Likelihood Ratio statistic to test the adequacy of the number of factors chosen), The dataset Harmon23.cor in the datasets package is a correlation matrix of eight physical measurements made on 305 girls between the ages of 7 and 17.

  1. Perform a factor analysis of this data using the command factanal(factors = m, covmat = Harman23.cor) where m is the number of factors.
library(datasets)
data("Harman23.cor")
head(Harman23.cor)
## $cov
##                height arm.span forearm lower.leg weight bitro.diameter
## height          1.000    0.846   0.805     0.859  0.473          0.398
## arm.span        0.846    1.000   0.881     0.826  0.376          0.326
## forearm         0.805    0.881   1.000     0.801  0.380          0.319
## lower.leg       0.859    0.826   0.801     1.000  0.436          0.329
## weight          0.473    0.376   0.380     0.436  1.000          0.762
## bitro.diameter  0.398    0.326   0.319     0.329  0.762          1.000
## chest.girth     0.301    0.277   0.237     0.327  0.730          0.583
## chest.width     0.382    0.415   0.345     0.365  0.629          0.577
##                chest.girth chest.width
## height               0.301       0.382
## arm.span             0.277       0.415
## forearm              0.237       0.345
## lower.leg            0.327       0.365
## weight               0.730       0.629
## bitro.diameter       0.583       0.577
## chest.girth          1.000       0.539
## chest.width          0.539       1.000
## 
## $center
## [1] 0 0 0 0 0 0 0 0
## 
## $n.obs
## [1] 305
factanal(factors = 3, covmat = Harman23.cor, cor = TRUE)
## 
## Call:
## factanal(factors = 3, covmat = Harman23.cor, cor = TRUE)
## 
## Uniquenesses:
##         height       arm.span        forearm      lower.leg         weight 
##          0.127          0.005          0.193          0.157          0.090 
## bitro.diameter    chest.girth    chest.width 
##          0.359          0.411          0.490 
## 
## Loadings:
##                Factor1 Factor2 Factor3
## height          0.886   0.267  -0.130 
## arm.span        0.937   0.195   0.280 
## forearm         0.874   0.188         
## lower.leg       0.877   0.230  -0.145 
## weight          0.242   0.916  -0.106 
## bitro.diameter  0.193   0.777         
## chest.girth     0.137   0.755         
## chest.width     0.261   0.646   0.159 
## 
##                Factor1 Factor2 Factor3
## SS loadings      3.379   2.628   0.162
## Proportion Var   0.422   0.329   0.020
## Cumulative Var   0.422   0.751   0.771
## 
## Test of the hypothesis that 3 factors are sufficient.
## The chi square statistic is 22.81 on 7 degrees of freedom.
## The p-value is 0.00184
factanal(factors = 3, covmat = Harman23.cor)
## 
## Call:
## factanal(factors = 3, covmat = Harman23.cor)
## 
## Uniquenesses:
##         height       arm.span        forearm      lower.leg         weight 
##          0.127          0.005          0.193          0.157          0.090 
## bitro.diameter    chest.girth    chest.width 
##          0.359          0.411          0.490 
## 
## Loadings:
##                Factor1 Factor2 Factor3
## height          0.886   0.267  -0.130 
## arm.span        0.937   0.195   0.280 
## forearm         0.874   0.188         
## lower.leg       0.877   0.230  -0.145 
## weight          0.242   0.916  -0.106 
## bitro.diameter  0.193   0.777         
## chest.girth     0.137   0.755         
## chest.width     0.261   0.646   0.159 
## 
##                Factor1 Factor2 Factor3
## SS loadings      3.379   2.628   0.162
## Proportion Var   0.422   0.329   0.020
## Cumulative Var   0.422   0.751   0.771
## 
## Test of the hypothesis that 3 factors are sufficient.
## The chi square statistic is 22.81 on 7 degrees of freedom.
## The p-value is 0.00184
  1. Vary the number of factors to find an adequate fit of the model and interpret the resulting factor loadings.
factanal(factors = 2, covmat = Harman23.cor)
## 
## Call:
## factanal(factors = 2, covmat = Harman23.cor)
## 
## Uniquenesses:
##         height       arm.span        forearm      lower.leg         weight 
##          0.170          0.107          0.166          0.199          0.089 
## bitro.diameter    chest.girth    chest.width 
##          0.364          0.416          0.537 
## 
## Loadings:
##                Factor1 Factor2
## height         0.865   0.287  
## arm.span       0.927   0.181  
## forearm        0.895   0.179  
## lower.leg      0.859   0.252  
## weight         0.233   0.925  
## bitro.diameter 0.194   0.774  
## chest.girth    0.134   0.752  
## chest.width    0.278   0.621  
## 
##                Factor1 Factor2
## SS loadings      3.335   2.617
## Proportion Var   0.417   0.327
## Cumulative Var   0.417   0.744
## 
## Test of the hypothesis that 2 factors are sufficient.
## The chi square statistic is 75.74 on 13 degrees of freedom.
## The p-value is 6.94e-11
  1. Does the principal component analysis produce different conclusions when the correlation matrix (cor = TRUE) option is used? Which analysis do you prefer?

Answer the analysis are both sufficient. There is no difference when cor = TRUE is used. I would prefer the one with 3 factors because it accounts for more variability

#8.5 Examine the USJudgeRatings data in the datasets library. This dataset contains the ratings of 43 US Superior Court judges by attorneys. Each of the judges is evaluated on each of 12 attributes such as demeanor, preparation for trial, sound rulings, and the number of contacts each attorney had with that judge. See the R help file for more information on this dataset. (a) Examine the pair-wise scatterplot for this data (with the pairs command) to reveal that some variables are very highly correlated.

data(USJudgeRatings)
head(USJudgeRatings)
##                CONT INTG DMNR DILG CFMG DECI PREP FAMI ORAL WRIT PHYS RTEN
## AARONSON,L.H.   5.7  7.9  7.7  7.3  7.1  7.4  7.1  7.1  7.1  7.0  8.3  7.8
## ALEXANDER,J.M.  6.8  8.9  8.8  8.5  7.8  8.1  8.0  8.0  7.8  7.9  8.5  8.7
## ARMENTANO,A.J.  7.2  8.1  7.8  7.8  7.5  7.6  7.5  7.5  7.3  7.4  7.9  7.8
## BERDON,R.I.     6.8  8.8  8.5  8.8  8.3  8.5  8.7  8.7  8.4  8.5  8.8  8.7
## BRACKEN,J.J.    7.3  6.4  4.3  6.5  6.0  6.2  5.7  5.7  5.1  5.3  5.5  4.8
## BURNS,E.B.      6.2  8.8  8.7  8.5  7.9  8.0  8.1  8.0  8.0  8.0  8.6  8.6
pairs(USJudgeRatings)

  1. Perform a principal components analysis for this data. The first two components explain 94 % of the variability. The second component is almost entirely the number of contacts, and the first component is essentially all other variables, all given the same weight. Interpret this result.
pcjudge <- princomp(USJudgeRatings)
summary(pcjudge)
## Importance of components:
##                           Comp.1     Comp.2     Comp.3     Comp.4      Comp.5
## Standard deviation     2.9944812 0.97631616 0.54821237 0.46436892 0.260897305
## Proportion of Variance 0.8476102 0.09010189 0.02840865 0.02038352 0.006434164
## Cumulative Proportion  0.8476102 0.93771205 0.96612070 0.98650423 0.992938391
##                             Comp.6      Comp.7      Comp.8       Comp.9
## Standard deviation     0.172098999 0.127444176 0.107411902 0.0862216855
## Proportion of Variance 0.002799688 0.001535299 0.001090581 0.0007027259
## Cumulative Proportion  0.995738079 0.997273378 0.998363959 0.9990666848
##                             Comp.10      Comp.11      Comp.12
## Standard deviation     0.0703843063 0.0550575991 0.0434546479
## Proportion of Variance 0.0004682789 0.0002865415 0.0001784947
## Cumulative Proportion  0.9995349638 0.9998215053 1.0000000000
pcjudge$loadings
## 
## Loadings:
##      Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9 Comp.10
## CONT         0.933  0.320  0.113                                           
## INTG -0.235 -0.139  0.370  0.252         0.463 -0.366 -0.418  0.377 -0.180 
## DMNR -0.348 -0.232  0.663        -0.194 -0.361  0.394  0.167  0.123        
## DILG -0.287        -0.224  0.273 -0.376  0.564  0.255  0.283         0.416 
## CFMG -0.272  0.163 -0.189        -0.480 -0.170  0.109 -0.680 -0.269 -0.132 
## DECI -0.253  0.118 -0.249        -0.420 -0.369 -0.483  0.318  0.408        
## PREP -0.309        -0.217  0.191  0.146         0.384  0.169        -0.641 
## FAMI -0.305        -0.267  0.169  0.471 -0.108                0.229 -0.123 
## ORAL -0.332                       0.253 -0.142        -0.117 -0.272  0.355 
## WRIT -0.314        -0.115  0.142  0.295 -0.227 -0.102 -0.142         0.435 
## PHYS -0.278               -0.859         0.241  0.159         0.275        
## RTEN -0.359         0.196 -0.153         0.164 -0.466  0.294 -0.625 -0.155 
##      Comp.11 Comp.12
## CONT                
## INTG -0.160         
## DMNR  0.113         
## DILG                
## CFMG  0.194         
## DECI -0.187         
## PREP -0.340   0.293 
## FAMI  0.535  -0.468 
## ORAL -0.637  -0.430 
## WRIT  0.106   0.703 
## PHYS                
## RTEN  0.245         
## 
##                Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9
## SS loadings     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
## Proportion Var  0.083  0.083  0.083  0.083  0.083  0.083  0.083  0.083  0.083
## Cumulative Var  0.083  0.167  0.250  0.333  0.417  0.500  0.583  0.667  0.750
##                Comp.10 Comp.11 Comp.12
## SS loadings      1.000   1.000   1.000
## Proportion Var   0.083   0.083   0.083
## Cumulative Var   0.833   0.917   1.000

Answer This means that 93% of the variability of the data is in the CONT variable or you can use all the other variables together to get approximately the same result.

#8.6 (a-b, d) Six different tests of intelligence and ability were administered to 112 people. The covariance matrix (but not the original data) of the test results is given in ability.cov in the datasets library. The six tests are called general, picture, blocks, maze, reading, vocabulary, and reading. More information is given in the R help file. (a) Perform a factor analysis on the covariance matrix with factanal(factors = 2, covmat=ability.cov)

data(ability.cov)
head(ability.cov)
## $cov
##         general picture  blocks   maze reading   vocab
## general  24.641   5.991  33.520  6.023  20.755  29.701
## picture   5.991   6.700  18.137  1.782   4.936   7.204
## blocks   33.520  18.137 149.831 19.424  31.430  50.753
## maze      6.023   1.782  19.424 12.711   4.757   9.075
## reading  20.755   4.936  31.430  4.757  52.604  66.762
## vocab    29.701   7.204  50.753  9.075  66.762 135.292
## 
## $center
## [1] 0 0 0 0 0 0
## 
## $n.obs
## [1] 112
factanal(factors = 2, covmat=ability.cov)
## 
## Call:
## factanal(factors = 2, covmat = ability.cov)
## 
## Uniquenesses:
## general picture  blocks    maze reading   vocab 
##   0.455   0.589   0.218   0.769   0.052   0.334 
## 
## Loadings:
##         Factor1 Factor2
## general 0.499   0.543  
## picture 0.156   0.622  
## blocks  0.206   0.860  
## maze    0.109   0.468  
## reading 0.956   0.182  
## vocab   0.785   0.225  
## 
##                Factor1 Factor2
## SS loadings      1.858   1.724
## Proportion Var   0.310   0.287
## Cumulative Var   0.310   0.597
## 
## Test of the hypothesis that 2 factors are sufficient.
## The chi square statistic is 6.11 on 4 degrees of freedom.
## The p-value is 0.191

Use the loadings to identify those variables that group together within the first two factors. Interpret these factors. (b) Perform a principal components analysis using the covariance matrix

# data(ability.cov)
# attach(ability.cov)
# 
#   pc <- princomp(ability.cor)

and identify the variables making the largest contributions to the first two principal components. How do you interpret these principal components?

  1. The cov2cor function efficiently converts covariances into correlation matrices. That is
ability.cor <- cov2cor(ability.cov$cov)
princomp(ability.cor)
## Call:
## princomp(x = ability.cor)
## 
## Standard deviations:
##    Comp.1    Comp.2    Comp.3    Comp.4    Comp.5    Comp.6 
## 0.4776241 0.3448660 0.1981002 0.1570958 0.0818317 0.0000000 
## 
##  6  variables and  6 observations.
princomp(ability.cor)$loadings
## 
## Loadings:
##         Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
## general  0.136  0.226  0.794  0.522  0.161       
## picture -0.280  0.683 -0.284  0.157         0.589
## blocks  -0.263  0.252  0.499 -0.783              
## maze    -0.366 -0.645  0.179                0.644
## reading  0.599                      -0.708  0.358
## vocab    0.584               -0.283  0.683  0.327
## 
##                Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
## SS loadings     1.000  1.000  1.000  1.000  1.000  1.000
## Proportion Var  0.167  0.167  0.167  0.167  0.167  0.167
## Cumulative Var  0.167  0.333  0.500  0.667  0.833  1.000

obtains the correlation and performs the principal components analysis. Examine the loadings and interpret the first two principal components. Compare this data summary with parts (a) and (b) How do these differ? How are they similar?

Answer

for the second pca, the first component, reading and vocab make up ~60 and 58% of the variability, the second component, picture and maze make up 68% in the positive directon and 65% in the other direction. These two components are orthognal to each other because they have different information.

#8.11 (a-b).

The Synthetic Aperture Personality Assessment (SAPA) Project is a web-based psychological data collection.2 A subset of the data is available in R as bfi in the psych library. This subset contains data on three demographic variables and 25 personality items submitted by 2800 volunteers. As examples of these items, we have: – I know how to comfort others. – I waste my time. – I make friends easily. Each item is rated on a scale of 1–7, on whether the respondent feels that he or she agrees with the statement a lot, disagrees a lot, or falls somewhere in between. See the bfi help file for more details.

  1. Use the complete.cases() command to remove individuals in bfi with any missing values.
library(psych)
data(bfi)

new_bfi <- bfi[complete.cases(bfi), ]
  1. Use factor analysis to group together items of a similar nature. Try to interpret the nature of items that cluster together. This is a useful exercise in psychology. The chi-squared test of the adequacy of the number of factors may not be appropriate with such a large sample size.
factanal(new_bfi,factors = 3)
## 
## Call:
## factanal(x = new_bfi, factors = 3)
## 
## Uniquenesses:
##        A1        A2        A3        A4        A5        C1        C2        C3 
##     0.940     0.721     0.616     0.814     0.579     0.661     0.615     0.722 
##        C4        C5        E1        E2        E3        E4        E5        N1 
##     0.532     0.635     0.714     0.579     0.594     0.493     0.630     0.351 
##        N2        N3        N4        N5        O1        O2        O3        O4 
##     0.366     0.468     0.604     0.725     0.882     0.918     0.824     0.956 
##        O5    gender education       age 
##     0.946     0.948     0.994     0.975 
## 
## Loadings:
##           Factor1 Factor2 Factor3
## A1        -0.205   0.132         
## A2         0.517           0.105 
## A3         0.614                 
## A4         0.396           0.144 
## A5         0.622  -0.172         
## C1         0.109           0.569 
## C2         0.115   0.113   0.599 
## C3         0.100           0.517 
## C4                 0.169  -0.657 
## C5        -0.168   0.232  -0.532 
## E1        -0.533                 
## E2        -0.592   0.233  -0.127 
## E3         0.623           0.133 
## E4         0.691  -0.162         
## E5         0.496           0.347 
## N1                 0.796  -0.123 
## N2                 0.791         
## N3                 0.717  -0.130 
## N4        -0.242   0.545  -0.202 
## N5                 0.498  -0.160 
## O1         0.242           0.243 
## O2                 0.127  -0.256 
## O3         0.363           0.208 
## O4                 0.206         
## O5                        -0.219 
## gender     0.198   0.110         
## education                        
## age                        0.113 
## 
##                Factor1 Factor2 Factor3
## SS loadings      3.349   2.644   2.203
## Proportion Var   0.120   0.094   0.079
## Cumulative Var   0.120   0.214   0.293
## 
## Test of the hypothesis that 3 factors are sufficient.
## The chi square statistic is 4818.04 on 297 degrees of freedom.
## The p-value is 0

Answer

All the measures can be summarized as a 3D measurement. THe first factor is made up of A5 and E4, The second factor, N1, and the 3rd factor, C4 or C2