#8.2 (in (b), use Likelihood Ratio statistic to test the adequacy of the number of factors chosen), The dataset Harmon23.cor in the datasets package is a correlation matrix of eight physical measurements made on 305 girls between the ages of 7 and 17.
library(datasets)
data("Harman23.cor")
head(Harman23.cor)
## $cov
## height arm.span forearm lower.leg weight bitro.diameter
## height 1.000 0.846 0.805 0.859 0.473 0.398
## arm.span 0.846 1.000 0.881 0.826 0.376 0.326
## forearm 0.805 0.881 1.000 0.801 0.380 0.319
## lower.leg 0.859 0.826 0.801 1.000 0.436 0.329
## weight 0.473 0.376 0.380 0.436 1.000 0.762
## bitro.diameter 0.398 0.326 0.319 0.329 0.762 1.000
## chest.girth 0.301 0.277 0.237 0.327 0.730 0.583
## chest.width 0.382 0.415 0.345 0.365 0.629 0.577
## chest.girth chest.width
## height 0.301 0.382
## arm.span 0.277 0.415
## forearm 0.237 0.345
## lower.leg 0.327 0.365
## weight 0.730 0.629
## bitro.diameter 0.583 0.577
## chest.girth 1.000 0.539
## chest.width 0.539 1.000
##
## $center
## [1] 0 0 0 0 0 0 0 0
##
## $n.obs
## [1] 305
factanal(factors = 3, covmat = Harman23.cor, cor = TRUE)
##
## Call:
## factanal(factors = 3, covmat = Harman23.cor, cor = TRUE)
##
## Uniquenesses:
## height arm.span forearm lower.leg weight
## 0.127 0.005 0.193 0.157 0.090
## bitro.diameter chest.girth chest.width
## 0.359 0.411 0.490
##
## Loadings:
## Factor1 Factor2 Factor3
## height 0.886 0.267 -0.130
## arm.span 0.937 0.195 0.280
## forearm 0.874 0.188
## lower.leg 0.877 0.230 -0.145
## weight 0.242 0.916 -0.106
## bitro.diameter 0.193 0.777
## chest.girth 0.137 0.755
## chest.width 0.261 0.646 0.159
##
## Factor1 Factor2 Factor3
## SS loadings 3.379 2.628 0.162
## Proportion Var 0.422 0.329 0.020
## Cumulative Var 0.422 0.751 0.771
##
## Test of the hypothesis that 3 factors are sufficient.
## The chi square statistic is 22.81 on 7 degrees of freedom.
## The p-value is 0.00184
factanal(factors = 3, covmat = Harman23.cor)
##
## Call:
## factanal(factors = 3, covmat = Harman23.cor)
##
## Uniquenesses:
## height arm.span forearm lower.leg weight
## 0.127 0.005 0.193 0.157 0.090
## bitro.diameter chest.girth chest.width
## 0.359 0.411 0.490
##
## Loadings:
## Factor1 Factor2 Factor3
## height 0.886 0.267 -0.130
## arm.span 0.937 0.195 0.280
## forearm 0.874 0.188
## lower.leg 0.877 0.230 -0.145
## weight 0.242 0.916 -0.106
## bitro.diameter 0.193 0.777
## chest.girth 0.137 0.755
## chest.width 0.261 0.646 0.159
##
## Factor1 Factor2 Factor3
## SS loadings 3.379 2.628 0.162
## Proportion Var 0.422 0.329 0.020
## Cumulative Var 0.422 0.751 0.771
##
## Test of the hypothesis that 3 factors are sufficient.
## The chi square statistic is 22.81 on 7 degrees of freedom.
## The p-value is 0.00184
factanal(factors = 2, covmat = Harman23.cor)
##
## Call:
## factanal(factors = 2, covmat = Harman23.cor)
##
## Uniquenesses:
## height arm.span forearm lower.leg weight
## 0.170 0.107 0.166 0.199 0.089
## bitro.diameter chest.girth chest.width
## 0.364 0.416 0.537
##
## Loadings:
## Factor1 Factor2
## height 0.865 0.287
## arm.span 0.927 0.181
## forearm 0.895 0.179
## lower.leg 0.859 0.252
## weight 0.233 0.925
## bitro.diameter 0.194 0.774
## chest.girth 0.134 0.752
## chest.width 0.278 0.621
##
## Factor1 Factor2
## SS loadings 3.335 2.617
## Proportion Var 0.417 0.327
## Cumulative Var 0.417 0.744
##
## Test of the hypothesis that 2 factors are sufficient.
## The chi square statistic is 75.74 on 13 degrees of freedom.
## The p-value is 6.94e-11
Answer the analysis are both sufficient. There is no difference when cor = TRUE is used. I would prefer the one with 3 factors because it accounts for more variability
#8.5 Examine the USJudgeRatings data in the datasets library. This dataset contains the ratings of 43 US Superior Court judges by attorneys. Each of the judges is evaluated on each of 12 attributes such as demeanor, preparation for trial, sound rulings, and the number of contacts each attorney had with that judge. See the R help file for more information on this dataset. (a) Examine the pair-wise scatterplot for this data (with the pairs command) to reveal that some variables are very highly correlated.
data(USJudgeRatings)
head(USJudgeRatings)
## CONT INTG DMNR DILG CFMG DECI PREP FAMI ORAL WRIT PHYS RTEN
## AARONSON,L.H. 5.7 7.9 7.7 7.3 7.1 7.4 7.1 7.1 7.1 7.0 8.3 7.8
## ALEXANDER,J.M. 6.8 8.9 8.8 8.5 7.8 8.1 8.0 8.0 7.8 7.9 8.5 8.7
## ARMENTANO,A.J. 7.2 8.1 7.8 7.8 7.5 7.6 7.5 7.5 7.3 7.4 7.9 7.8
## BERDON,R.I. 6.8 8.8 8.5 8.8 8.3 8.5 8.7 8.7 8.4 8.5 8.8 8.7
## BRACKEN,J.J. 7.3 6.4 4.3 6.5 6.0 6.2 5.7 5.7 5.1 5.3 5.5 4.8
## BURNS,E.B. 6.2 8.8 8.7 8.5 7.9 8.0 8.1 8.0 8.0 8.0 8.6 8.6
pairs(USJudgeRatings)
pcjudge <- princomp(USJudgeRatings)
summary(pcjudge)
## Importance of components:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
## Standard deviation 2.9944812 0.97631616 0.54821237 0.46436892 0.260897305
## Proportion of Variance 0.8476102 0.09010189 0.02840865 0.02038352 0.006434164
## Cumulative Proportion 0.8476102 0.93771205 0.96612070 0.98650423 0.992938391
## Comp.6 Comp.7 Comp.8 Comp.9
## Standard deviation 0.172098999 0.127444176 0.107411902 0.0862216855
## Proportion of Variance 0.002799688 0.001535299 0.001090581 0.0007027259
## Cumulative Proportion 0.995738079 0.997273378 0.998363959 0.9990666848
## Comp.10 Comp.11 Comp.12
## Standard deviation 0.0703843063 0.0550575991 0.0434546479
## Proportion of Variance 0.0004682789 0.0002865415 0.0001784947
## Cumulative Proportion 0.9995349638 0.9998215053 1.0000000000
pcjudge$loadings
##
## Loadings:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9 Comp.10
## CONT 0.933 0.320 0.113
## INTG -0.235 -0.139 0.370 0.252 0.463 -0.366 -0.418 0.377 -0.180
## DMNR -0.348 -0.232 0.663 -0.194 -0.361 0.394 0.167 0.123
## DILG -0.287 -0.224 0.273 -0.376 0.564 0.255 0.283 0.416
## CFMG -0.272 0.163 -0.189 -0.480 -0.170 0.109 -0.680 -0.269 -0.132
## DECI -0.253 0.118 -0.249 -0.420 -0.369 -0.483 0.318 0.408
## PREP -0.309 -0.217 0.191 0.146 0.384 0.169 -0.641
## FAMI -0.305 -0.267 0.169 0.471 -0.108 0.229 -0.123
## ORAL -0.332 0.253 -0.142 -0.117 -0.272 0.355
## WRIT -0.314 -0.115 0.142 0.295 -0.227 -0.102 -0.142 0.435
## PHYS -0.278 -0.859 0.241 0.159 0.275
## RTEN -0.359 0.196 -0.153 0.164 -0.466 0.294 -0.625 -0.155
## Comp.11 Comp.12
## CONT
## INTG -0.160
## DMNR 0.113
## DILG
## CFMG 0.194
## DECI -0.187
## PREP -0.340 0.293
## FAMI 0.535 -0.468
## ORAL -0.637 -0.430
## WRIT 0.106 0.703
## PHYS
## RTEN 0.245
##
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9
## SS loadings 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
## Proportion Var 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083 0.083
## Cumulative Var 0.083 0.167 0.250 0.333 0.417 0.500 0.583 0.667 0.750
## Comp.10 Comp.11 Comp.12
## SS loadings 1.000 1.000 1.000
## Proportion Var 0.083 0.083 0.083
## Cumulative Var 0.833 0.917 1.000
Answer This means that 93% of the variability of the data is in the CONT variable or you can use all the other variables together to get approximately the same result.
#8.6 (a-b, d) Six different tests of intelligence and ability were administered to 112 people. The covariance matrix (but not the original data) of the test results is given in ability.cov in the datasets library. The six tests are called general, picture, blocks, maze, reading, vocabulary, and reading. More information is given in the R help file. (a) Perform a factor analysis on the covariance matrix with factanal(factors = 2, covmat=ability.cov)
data(ability.cov)
head(ability.cov)
## $cov
## general picture blocks maze reading vocab
## general 24.641 5.991 33.520 6.023 20.755 29.701
## picture 5.991 6.700 18.137 1.782 4.936 7.204
## blocks 33.520 18.137 149.831 19.424 31.430 50.753
## maze 6.023 1.782 19.424 12.711 4.757 9.075
## reading 20.755 4.936 31.430 4.757 52.604 66.762
## vocab 29.701 7.204 50.753 9.075 66.762 135.292
##
## $center
## [1] 0 0 0 0 0 0
##
## $n.obs
## [1] 112
factanal(factors = 2, covmat=ability.cov)
##
## Call:
## factanal(factors = 2, covmat = ability.cov)
##
## Uniquenesses:
## general picture blocks maze reading vocab
## 0.455 0.589 0.218 0.769 0.052 0.334
##
## Loadings:
## Factor1 Factor2
## general 0.499 0.543
## picture 0.156 0.622
## blocks 0.206 0.860
## maze 0.109 0.468
## reading 0.956 0.182
## vocab 0.785 0.225
##
## Factor1 Factor2
## SS loadings 1.858 1.724
## Proportion Var 0.310 0.287
## Cumulative Var 0.310 0.597
##
## Test of the hypothesis that 2 factors are sufficient.
## The chi square statistic is 6.11 on 4 degrees of freedom.
## The p-value is 0.191
Use the loadings to identify those variables that group together within the first two factors. Interpret these factors. (b) Perform a principal components analysis using the covariance matrix
# data(ability.cov)
# attach(ability.cov)
#
# pc <- princomp(ability.cor)
and identify the variables making the largest contributions to the first two principal components. How do you interpret these principal components?
ability.cor <- cov2cor(ability.cov$cov)
princomp(ability.cor)
## Call:
## princomp(x = ability.cor)
##
## Standard deviations:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
## 0.4776241 0.3448660 0.1981002 0.1570958 0.0818317 0.0000000
##
## 6 variables and 6 observations.
princomp(ability.cor)$loadings
##
## Loadings:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
## general 0.136 0.226 0.794 0.522 0.161
## picture -0.280 0.683 -0.284 0.157 0.589
## blocks -0.263 0.252 0.499 -0.783
## maze -0.366 -0.645 0.179 0.644
## reading 0.599 -0.708 0.358
## vocab 0.584 -0.283 0.683 0.327
##
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6
## SS loadings 1.000 1.000 1.000 1.000 1.000 1.000
## Proportion Var 0.167 0.167 0.167 0.167 0.167 0.167
## Cumulative Var 0.167 0.333 0.500 0.667 0.833 1.000
obtains the correlation and performs the principal components analysis. Examine the loadings and interpret the first two principal components. Compare this data summary with parts (a) and (b) How do these differ? How are they similar?
Answer
for the second pca, the first component, reading and vocab make up ~60 and 58% of the variability, the second component, picture and maze make up 68% in the positive directon and 65% in the other direction. These two components are orthognal to each other because they have different information.
#8.11 (a-b).
The Synthetic Aperture Personality Assessment (SAPA) Project is a web-based psychological data collection.2 A subset of the data is available in R as bfi in the psych library. This subset contains data on three demographic variables and 25 personality items submitted by 2800 volunteers. As examples of these items, we have: – I know how to comfort others. – I waste my time. – I make friends easily. Each item is rated on a scale of 1–7, on whether the respondent feels that he or she agrees with the statement a lot, disagrees a lot, or falls somewhere in between. See the bfi help file for more details.
library(psych)
data(bfi)
new_bfi <- bfi[complete.cases(bfi), ]
factanal(new_bfi,factors = 3)
##
## Call:
## factanal(x = new_bfi, factors = 3)
##
## Uniquenesses:
## A1 A2 A3 A4 A5 C1 C2 C3
## 0.940 0.721 0.616 0.814 0.579 0.661 0.615 0.722
## C4 C5 E1 E2 E3 E4 E5 N1
## 0.532 0.635 0.714 0.579 0.594 0.493 0.630 0.351
## N2 N3 N4 N5 O1 O2 O3 O4
## 0.366 0.468 0.604 0.725 0.882 0.918 0.824 0.956
## O5 gender education age
## 0.946 0.948 0.994 0.975
##
## Loadings:
## Factor1 Factor2 Factor3
## A1 -0.205 0.132
## A2 0.517 0.105
## A3 0.614
## A4 0.396 0.144
## A5 0.622 -0.172
## C1 0.109 0.569
## C2 0.115 0.113 0.599
## C3 0.100 0.517
## C4 0.169 -0.657
## C5 -0.168 0.232 -0.532
## E1 -0.533
## E2 -0.592 0.233 -0.127
## E3 0.623 0.133
## E4 0.691 -0.162
## E5 0.496 0.347
## N1 0.796 -0.123
## N2 0.791
## N3 0.717 -0.130
## N4 -0.242 0.545 -0.202
## N5 0.498 -0.160
## O1 0.242 0.243
## O2 0.127 -0.256
## O3 0.363 0.208
## O4 0.206
## O5 -0.219
## gender 0.198 0.110
## education
## age 0.113
##
## Factor1 Factor2 Factor3
## SS loadings 3.349 2.644 2.203
## Proportion Var 0.120 0.094 0.079
## Cumulative Var 0.120 0.214 0.293
##
## Test of the hypothesis that 3 factors are sufficient.
## The chi square statistic is 4818.04 on 297 degrees of freedom.
## The p-value is 0
Answer
All the measures can be summarized as a 3D measurement. THe first factor is made up of A5 and E4, The second factor, N1, and the 3rd factor, C4 or C2