01 Principal Component Analysis, Item Analysis and Confirmatory Factor Analysis
Class on 28 August 2021
In the first section of the S3729C Data Analytics Seminar, you will be led through the process of Principal Component Analysis (PCA). This will be done using the Community2Campus base data as an example.
For more information about PCA, you may refer to the following resources
In this section, you are not required to carry out any hands-on practice, but please it is meant to provide you with an appreciation of the process that goes into conducting exploratory data analysis, and better visualise the variation present ina dataset with many variables, or “wide” variables as we refer to it in the field.
This is especially pertinent when you have “wide” datasets such as the Community2Campus dataset which you are already familiar with from Lesson 06 to 08, as well as your CW1 presentations in Lesson 10.
This is the code for installation of Pacman which is used to load all packages later.
## package 'pacman' successfully unpacked and MD5 sums checked
##
## The downloaded binary packages are in
## C:\Users\aaron_chen_angus\AppData\Local\Temp\RtmpY7sLRp\downloaded_packages
Load the packages required for this section
You can now import the PCA data from the csv file which I have placed online at Github via the link https://raw.githubusercontent.com/aaron-chen-angus/community2campus/main/PCArevdata.csv using the read.csv command.
Check on the output by reading the column names
## [1] "NCSS_P1" "NCSS_P2" "NCSS_P3" "NCSS_P4" "NCSS_P5" "AT_PT1" "AT_PT2"
## [8] "AT_PT3" "AT_WT1" "AT_WT2" "AT_WT3" "AT_WT4" "AT_WT5" "AT_CT1"
## [15] "AT_CT2" "AT_CT3" "AT_CT4" "AT_CT5" "AT_LA1" "AT_LA2" "AT_LA3"
## [22] "AT_LA4" "AT_LA5" "AT_SC1" "AT_SC2" "AT_SC3" "AT_SC4" "AT_SC5"
## [29] "AT_SC6" "ST_SD1" "ST_SD2" "ST_SD3" "ST_SD4" "ST_SD5" "ST_SR2"
## [36] "ST_SR1" "ST_SR3" "ST_SR4" "ST_SR5"
There are three methods of PCA available in R
We will use the default method prcomp first
Get summary stats
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 3.9552 2.2443 1.28210 1.24473 1.17405 0.97936 0.93336
## Proportion of Variance 0.4011 0.1291 0.04215 0.03973 0.03534 0.02459 0.02234
## Cumulative Proportion 0.4011 0.5303 0.57241 0.61214 0.64748 0.67208 0.69442
## PC8 PC9 PC10 PC11 PC12 PC13 PC14
## Standard deviation 0.91529 0.91023 0.84528 0.80615 0.77563 0.76162 0.74171
## Proportion of Variance 0.02148 0.02124 0.01832 0.01666 0.01543 0.01487 0.01411
## Cumulative Proportion 0.71590 0.73714 0.75546 0.77212 0.78755 0.80242 0.81653
## PC15 PC16 PC17 PC18 PC19 PC20 PC21
## Standard deviation 0.72811 0.70806 0.69206 0.6756 0.66110 0.65390 0.6244
## Proportion of Variance 0.01359 0.01286 0.01228 0.0117 0.01121 0.01096 0.0100
## Cumulative Proportion 0.83012 0.84298 0.85526 0.8670 0.87817 0.88913 0.8991
## PC22 PC23 PC24 PC25 PC26 PC27 PC28
## Standard deviation 0.60497 0.59783 0.56482 0.55525 0.52770 0.51575 0.4997
## Proportion of Variance 0.00938 0.00916 0.00818 0.00791 0.00714 0.00682 0.0064
## Cumulative Proportion 0.90851 0.91768 0.92586 0.93376 0.94090 0.94772 0.9541
## PC29 PC30 PC31 PC32 PC33 PC34 PC35
## Standard deviation 0.48621 0.46285 0.45290 0.44246 0.42277 0.39387 0.36975
## Proportion of Variance 0.00606 0.00549 0.00526 0.00502 0.00458 0.00398 0.00351
## Cumulative Proportion 0.96019 0.96568 0.97094 0.97596 0.98054 0.98452 0.98803
## PC36 PC37 PC38 PC39
## Standard deviation 0.36186 0.35104 0.34942 0.30114
## Proportion of Variance 0.00336 0.00316 0.00313 0.00233
## Cumulative Proportion 0.99138 0.99454 0.99767 1.00000
Get a screeplot of the eigenvalues
##
## Very Simple Structure
## Call: vss(x = ., n = 10)
## VSS complexity 1 achieves a maximimum of 0.86 with 1 factors
## VSS complexity 2 achieves a maximimum of 0.95 with 2 factors
##
## The Velicer MAP achieves a minimum of 0.01 with 5 factors
## BIC achieves a minimum of -137.51 with 10 factors
## Sample Size adjusted BIC achieves a minimum of 1120.59 with 10 factors
##
## Statistics by number of factors
## vss1 vss2 map dof chisq prob sqresid fit RMSEA BIC SABIC complex eChisq
## 1 0.86 0.00 0.048 702 23939 0 40.1 0.86 0.131 18630 20861 1.0 45097
## 2 0.67 0.95 0.014 664 11597 0 15.0 0.95 0.093 6575 8685 1.3 7471
## 3 0.54 0.85 0.012 627 8999 0 12.6 0.96 0.083 4258 6250 1.7 5105
## 4 0.60 0.91 0.011 591 7079 0 10.4 0.96 0.076 2610 4488 1.7 3176
## 5 0.47 0.84 0.011 556 5753 0 8.7 0.97 0.070 1548 3315 2.1 1959
## 6 0.52 0.88 0.011 522 4978 0 7.9 0.97 0.067 1031 2689 2.0 1522
## 7 0.48 0.85 0.012 489 4403 0 7.3 0.97 0.064 705 2259 2.1 1199
## 8 0.48 0.86 0.013 457 3735 0 6.7 0.98 0.061 279 1731 2.2 923
## 9 0.46 0.86 0.014 426 3320 0 6.3 0.98 0.059 98 1452 2.2 744
## 10 0.47 0.86 0.016 396 2857 0 6.0 0.98 0.057 -138 1121 2.4 606
## SRMR eCRMS eBIC
## 1 0.126 0.129 39788
## 2 0.051 0.054 2449
## 3 0.042 0.046 364
## 4 0.033 0.037 -1293
## 5 0.026 0.030 -2246
## 6 0.023 0.028 -2425
## 7 0.021 0.025 -2498
## 8 0.018 0.023 -2533
## 9 0.016 0.021 -2478
## 10 0.015 0.020 -2388
##
## Number of factors
## Call: vss(x = x, n = n, rotate = rotate, diagonal = diagonal, fm = fm,
## n.obs = n.obs, plot = FALSE, title = title, use = use, cor = cor)
## VSS complexity 1 achieves a maximimum of 0.86 with 1 factors
## VSS complexity 2 achieves a maximimum of 0.95 with 2 factors
## The Velicer MAP achieves a minimum of 0.01 with 5 factors
## Empirical BIC achieves a minimum of -2532.75 with 8 factors
## Sample Size adjusted BIC achieves a minimum of 1120.59 with 10 factors
##
## Statistics by number of factors
## vss1 vss2 map dof chisq prob sqresid fit RMSEA BIC SABIC complex eChisq
## 1 0.86 0.00 0.048 702 23939 0 40.1 0.86 0.131 18630 20861 1.0 45097
## 2 0.67 0.95 0.014 664 11597 0 15.0 0.95 0.093 6575 8685 1.3 7471
## 3 0.54 0.85 0.012 627 8999 0 12.6 0.96 0.083 4258 6250 1.7 5105
## 4 0.60 0.91 0.011 591 7079 0 10.4 0.96 0.076 2610 4488 1.7 3176
## 5 0.47 0.84 0.011 556 5753 0 8.7 0.97 0.070 1548 3315 2.1 1959
## 6 0.52 0.88 0.011 522 4978 0 7.9 0.97 0.067 1031 2689 2.0 1522
## 7 0.48 0.85 0.012 489 4403 0 7.3 0.97 0.064 705 2259 2.1 1199
## 8 0.48 0.86 0.013 457 3735 0 6.7 0.98 0.061 279 1731 2.2 923
## 9 0.46 0.86 0.014 426 3320 0 6.3 0.98 0.059 98 1452 2.2 744
## 10 0.47 0.86 0.016 396 2857 0 6.0 0.98 0.057 -138 1121 2.4 606
## SRMR eCRMS eBIC
## 1 0.126 0.129 39788
## 2 0.051 0.054 2449
## 3 0.042 0.046 364
## 4 0.033 0.037 -1293
## 5 0.026 0.030 -2246
## 6 0.023 0.028 -2425
## 7 0.021 0.025 -2498
## 8 0.018 0.023 -2533
## 9 0.016 0.021 -2478
## 10 0.015 0.020 -2388
Calculate and plot factors with fa()
Plot the fa.diagram to visualise the rotated factor solution, and interpret whether it is an interpretable solution
Hierarchical clustering of items with iclust()
## ICLUST (Item Cluster Analysis)
## Call: iclust(r.mat = .)
##
## Purified Alpha:
## C37 C26
## 0.95 0.88
##
## G6* reliability:
## C37 C26
## 1.0 0.9
##
## Original Beta:
## C37 C26
## 0.75 0.70
##
## Cluster size:
## C37 C26
## 30 9
##
## Item by Cluster Structure matrix:
## O P C37 C26
## NCSS_P1 C37 C26 0.43 0.63
## NCSS_P2 C37 C37 0.32 -0.10
## NCSS_P3 C37 C37 0.56 0.24
## NCSS_P4 C37 C26 0.49 0.53
## NCSS_P5 C37 C26 0.55 0.65
## AT_PT1 C37 C37 0.78 0.38
## AT_PT2 C37 C37 0.79 0.30
## AT_PT3 C37 C37 0.87 0.35
## AT_WT1 C37 C26 0.65 0.82
## AT_WT2 C26 C26 0.06 -0.54
## AT_WT3 C26 C26 0.18 0.67
## AT_WT4 C26 C26 -0.23 -0.71
## AT_WT5 C37 C37 0.58 0.06
## AT_CT1 C37 C37 0.66 0.09
## AT_CT2 C37 C37 0.69 0.06
## AT_CT3 C37 C37 0.84 0.43
## AT_CT4 C37 C37 0.30 -0.17
## AT_CT5 C37 C37 0.55 0.50
## AT_LA1 C37 C37 0.65 0.18
## AT_LA2 C37 C37 0.37 -0.03
## AT_LA3 C37 C37 0.53 0.05
## AT_LA4 C26 C26 0.31 0.65
## AT_LA5 C37 C26 0.58 0.78
## AT_SC1 C37 C37 0.40 -0.02
## AT_SC2 C37 C37 0.80 0.46
## AT_SC3 C37 C37 0.68 0.19
## AT_SC4 C37 C37 0.42 0.10
## AT_SC5 C37 C37 0.29 0.06
## AT_SC6 C37 C37 0.85 0.36
## ST_SD1 C37 C37 0.87 0.41
## ST_SD2 C37 C37 0.90 0.41
## ST_SD3 C37 C37 0.88 0.38
## ST_SD4 C37 C37 0.59 0.19
## ST_SD5 C37 C37 0.51 0.45
## ST_SR2 C37 C37 0.75 0.74
## ST_SR1 C37 C37 0.74 0.65
## ST_SR3 C37 C37 0.76 0.70
## ST_SR4 C37 C37 0.73 0.69
## ST_SR5 C37 C37 0.61 0.57
##
## With eigenvalues of:
## C37 C26
## 13.9 5.5
##
## Purified scale intercorrelations
## reliabilities on diagonal
## correlations corrected for attenuation above diagonal:
## C37 C26
## C37 0.95 0.52
## C26 0.48 0.88
##
## Cluster fit = 0.86 Pattern fit = 0.98 RMSR = 0.07
First PCA with no rotation, specify 5 factors
## Principal Components Analysis
## Call: principal(r = ., nfactors = 5)
## Standardized loadings (pattern matrix) based upon correlation matrix
## RC2 RC1 RC4 RC3 RC5 h2 u2 com
## NCSS_P1 0.44 0.11 -0.20 0.59 0.09 0.60 0.40 2.3
## NCSS_P2 0.02 0.09 0.64 0.38 0.23 0.61 0.39 2.0
## NCSS_P3 0.17 0.35 0.30 0.53 0.25 0.59 0.41 3.2
## NCSS_P4 0.43 0.19 0.01 0.69 -0.19 0.74 0.26 2.0
## NCSS_P5 0.54 0.15 -0.01 0.68 -0.03 0.78 0.22 2.0
## AT_PT1 0.33 0.70 0.03 0.23 0.13 0.68 0.32 1.8
## AT_PT2 0.34 0.69 0.19 0.16 0.11 0.67 0.33 1.8
## AT_PT3 0.41 0.73 0.18 0.10 0.21 0.79 0.21 2.0
## AT_WT1 0.78 0.25 -0.20 0.18 0.12 0.76 0.24 1.5
## AT_WT2 -0.32 0.15 0.64 0.00 -0.03 0.54 0.46 1.6
## AT_WT3 0.51 -0.12 -0.43 0.18 0.17 0.52 0.48 2.6
## AT_WT4 -0.42 -0.04 0.55 -0.33 -0.11 0.61 0.39 2.7
## AT_WT5 0.26 0.58 0.35 -0.10 -0.16 0.56 0.44 2.3
## AT_CT1 0.27 0.58 0.40 -0.05 0.07 0.58 0.42 2.3
## AT_CT2 0.19 0.66 0.36 0.00 0.13 0.62 0.38 1.8
## AT_CT3 0.49 0.67 0.11 0.07 0.20 0.74 0.26 2.1
## AT_CT4 -0.01 0.34 0.35 -0.28 0.21 0.37 0.63 3.6
## AT_CT5 0.76 0.13 0.16 0.00 -0.02 0.61 0.39 1.2
## AT_LA1 0.22 0.68 0.12 -0.02 0.12 0.54 0.46 1.3
## AT_LA2 -0.22 0.61 -0.07 0.08 0.28 0.51 0.49 1.8
## AT_LA3 -0.10 0.78 -0.07 0.10 0.07 0.64 0.36 1.1
## AT_LA4 0.53 0.02 -0.38 0.15 0.22 0.50 0.50 2.4
## AT_LA5 0.77 0.13 -0.15 0.23 0.19 0.72 0.28 1.4
## AT_SC1 0.16 0.21 0.49 0.03 0.33 0.42 0.58 2.4
## AT_SC2 0.61 0.48 0.17 -0.04 0.35 0.76 0.24 2.8
## AT_SC3 0.34 0.45 0.37 0.05 0.33 0.57 0.43 3.8
## AT_SC4 0.19 0.26 0.17 -0.16 0.62 0.54 0.46 1.9
## AT_SC5 -0.06 0.23 0.01 0.12 0.69 0.55 0.45 1.3
## AT_SC6 0.44 0.71 0.14 0.01 0.24 0.78 0.22 2.0
## ST_SD1 0.46 0.72 0.11 0.10 0.18 0.79 0.21 2.0
## ST_SD2 0.45 0.76 0.11 0.14 0.14 0.83 0.17 1.8
## ST_SD3 0.43 0.75 0.13 0.16 0.10 0.80 0.20 1.8
## ST_SD4 0.07 0.75 -0.07 0.20 -0.09 0.61 0.39 1.2
## ST_SD5 0.58 0.25 0.00 0.17 -0.19 0.47 0.53 1.8
## ST_SR2 0.87 0.27 0.01 0.12 0.14 0.86 0.14 1.3
## ST_SR1 0.84 0.30 0.05 0.10 0.03 0.81 0.19 1.3
## ST_SR3 0.82 0.32 0.00 0.18 0.06 0.82 0.18 1.4
## ST_SR4 0.80 0.33 -0.04 0.21 -0.04 0.79 0.21 1.5
## ST_SR5 0.63 0.33 -0.04 0.25 -0.14 0.59 0.41 2.0
##
## RC2 RC1 RC4 RC3 RC5
## SS loadings 9.08 8.83 2.82 2.49 2.03
## Proportion Var 0.23 0.23 0.07 0.06 0.05
## Cumulative Var 0.23 0.46 0.53 0.60 0.65
## Proportion Explained 0.36 0.35 0.11 0.10 0.08
## Cumulative Proportion 0.36 0.71 0.82 0.92 1.00
##
## Mean item complexity = 2
## Test of the hypothesis that 5 components are sufficient.
##
## The root mean square of the residuals (RMSR) is 0.04
## with the empirical chi square 3907.38 with prob < 0
##
## Fit based upon off diagonal values = 0.99
Second PCA with oblimin (oblique) rotation
df %>%
principal(
nfactors = 5,
rotate = "oblimin"
) %>%
plot() # Plot position of variables on componentsForming a key for scale scores and a scoring key with indices defined by PCA for the variables
key <- list(
MR1 = c(6, 7 ,8, 13, 14, 15, 16, 17, 19, 20, 21, 29, 30, 31, 32, 33),
MR2 = c(2, 10, 11, 12, 22),
MR3 = c(9,18, 23, 25, 34, 35, 36, 37, 38, 39),
MR4 = c(1, 3, 4, 5),
MR5 = c(24, 26, 27, 28)
)Matrix of alphas, correlations, and correlations corrected for attentuation using scoreItems() from psych
## Call: scoreItems(keys = key, items = df)
##
## (Unstandardized) Alpha:
## MR1 MR2 MR3 MR4 MR5
## alpha 0.88 0.77 0.76 0.83 0.89
##
## Standard errors of unstandardized Alpha:
## MR1 MR2 MR3 MR4 MR5
## ASE 0.0061 0.015 0.011 0.015 0.013
##
## Average item correlation:
## MR1 MR2 MR3 MR4 MR5
## average.r 0.3 0.4 0.24 0.55 0.66
##
## Median item correlation:
## MR1 MR2 MR3 MR4 MR5
## 0.32 0.46 0.25 0.54 0.64
##
## Guttman 6* reliability:
## MR1 MR2 MR3 MR4 MR5
## Lambda.6 0.94 0.87 0.86 0.88 0.91
##
## Signal/Noise based upon av.r :
## MR1 MR2 MR3 MR4 MR5
## Signal/Noise 7 3.4 3.2 5 7.8
##
## Scale intercorrelations corrected for attenuation
## raw correlations below the diagonal, alpha on the diagonal
## corrected correlations above the diagonal:
## MR1 MR2 MR3 MR4 MR5
## MR1 0.88 1.04 1.03 0.98 0.89
## MR2 0.86 0.77 0.97 1.04 0.76
## MR3 0.85 0.75 0.76 0.95 0.81
## MR4 0.83 0.83 0.76 0.83 0.65
## MR5 0.78 0.63 0.67 0.55 0.89
##
## In order to see the item by scale loadings and frequency counts of the data
## print with the short option = FALSE
Add Scale Scores to Data
df %<>%
mutate(
MR1 = rowMeans(df[c(6, 7 ,8, 13, 14, 15, 16, 17, 19, 20, 21, 29, 30, 31, 32, 33)], na.rm = T),
MR2 = rowMeans(df[c(2, 10, 11, 12, 22)], na.rm = T),
MR3 = rowMeans(df[c(9,18, 23, 25, 34, 35, 36, 37, 38, 39)], na.rm = T),
MR4 = rowMeans(df[c(1, 3, 4, 5)], na.rm = T),
MR5 = rowMeans(df[c(24, 26, 27, 28)], na.rm = T)
)Descriptives for scale scores
## vars n mean sd median trimmed mad min max range skew kurtosis se
## MR1 1 1924 2.94 0.98 3.0 2.95 1.11 1.0 5.0 4.0 -0.06 -0.77 0.02
## MR2 2 1924 2.85 0.46 2.8 2.84 0.30 1.2 4.6 3.4 0.12 0.87 0.01
## MR3 3 1924 3.31 1.17 3.7 3.37 1.33 1.0 5.0 4.0 -0.39 -1.25 0.03
## MR4 4 1924 2.96 1.02 3.0 2.95 1.11 1.0 5.0 4.0 0.06 -0.81 0.02
## MR5 5 1924 2.83 0.86 3.0 2.83 0.74 1.0 5.0 4.0 0.06 -0.02 0.02
Plot of Means and SDs for Aggregated Variables
df %>%
select(MR1:MR5) %>%
error.bars(
ylim = c(1, 5),
sd = T,
arrow.col = "gray",
eyes = F,
pch = 19,
main = "M & SD of C2C Aggregated Factors",
ylab = "Means",
xlab = "C2C Aggregated Factors"
)This is a whole bunch of histograms, scatterplots, correlation plots, etc, to summarise the outcome of the factor and item analysis done in previous sections.
df %>%
select(MR1:MR5) %>%
pairs.panels(
hist.col = "gray",
jiggle = T,
main = "C2C Aggregated Factors"
)Cronbach’s alpha is a measure of internal consistency, that is, how closely related a set of items are as a group. We will be computing the Cronbach’s alpha for each of the aggregated factors and their components. This is an example with MR1.
##
## Reliability analysis
## Call: psych::alpha(x = .)
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd median_r
## 0.95 0.95 0.96 0.53 18 0.0016 2.9 0.98 0.53
##
## lower alpha upper 95% confidence boundaries
## 0.95 0.95 0.95
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se var.r med.r
## AT_PT1 0.94 0.94 0.95 0.52 16 0.0017 0.029 0.53
## AT_PT2 0.94 0.94 0.95 0.52 16 0.0017 0.029 0.52
## AT_PT3 0.94 0.94 0.95 0.51 16 0.0018 0.027 0.52
## AT_WT5 0.95 0.95 0.95 0.54 17 0.0016 0.030 0.53
## AT_CT1 0.95 0.94 0.95 0.53 17 0.0017 0.030 0.52
## AT_CT2 0.95 0.94 0.95 0.53 17 0.0017 0.031 0.52
## AT_CT3 0.94 0.94 0.95 0.52 16 0.0018 0.029 0.52
## AT_CT4 0.95 0.95 0.96 0.56 19 0.0015 0.021 0.54
## AT_LA1 0.95 0.94 0.95 0.53 17 0.0017 0.031 0.53
## AT_LA2 0.95 0.95 0.96 0.55 19 0.0015 0.026 0.54
## AT_LA3 0.95 0.95 0.95 0.54 17 0.0016 0.031 0.54
## AT_SC6 0.94 0.94 0.95 0.51 16 0.0018 0.028 0.52
## ST_SD1 0.94 0.94 0.95 0.51 16 0.0018 0.027 0.52
## ST_SD2 0.94 0.94 0.95 0.51 16 0.0018 0.026 0.52
## ST_SD3 0.94 0.94 0.95 0.51 16 0.0018 0.026 0.52
## ST_SD4 0.95 0.95 0.95 0.54 17 0.0016 0.030 0.53
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## AT_PT1 1924 0.79 0.79 0.78 0.76 3.0 1.3
## AT_PT2 1924 0.82 0.82 0.81 0.79 2.9 1.3
## AT_PT3 1924 0.88 0.88 0.88 0.86 2.9 1.4
## AT_WT5 1924 0.65 0.66 0.62 0.60 2.7 1.2
## AT_CT1 1924 0.71 0.72 0.70 0.67 2.7 1.2
## AT_CT2 1924 0.76 0.76 0.75 0.72 2.8 1.2
## AT_CT3 1924 0.84 0.83 0.83 0.81 3.1 1.5
## AT_CT4 1924 0.42 0.42 0.37 0.35 2.7 1.2
## AT_LA1 1924 0.73 0.73 0.71 0.69 3.1 1.3
## AT_LA2 1924 0.49 0.51 0.46 0.44 3.0 1.1
## AT_LA3 1924 0.65 0.66 0.63 0.61 3.0 1.2
## AT_SC6 1924 0.87 0.86 0.86 0.84 3.1 1.4
## ST_SD1 1924 0.87 0.87 0.87 0.85 3.0 1.4
## ST_SD2 1924 0.89 0.89 0.89 0.87 3.1 1.4
## ST_SD3 1924 0.88 0.87 0.88 0.86 3.0 1.4
## ST_SD4 1924 0.67 0.67 0.64 0.62 2.9 1.2
##
## Non missing response frequency for each item
## 1 2 3 4 5 miss
## AT_PT1 0.15 0.22 0.28 0.16 0.19 0
## AT_PT2 0.17 0.23 0.28 0.16 0.17 0
## AT_PT3 0.20 0.25 0.20 0.15 0.21 0
## AT_WT5 0.20 0.21 0.36 0.13 0.10 0
## AT_CT1 0.17 0.27 0.30 0.16 0.09 0
## AT_CT2 0.17 0.25 0.32 0.16 0.10 0
## AT_CT3 0.20 0.18 0.19 0.17 0.26 0
## AT_CT4 0.20 0.24 0.31 0.16 0.09 0
## AT_LA1 0.14 0.19 0.31 0.20 0.17 0
## AT_LA2 0.10 0.23 0.38 0.20 0.10 0
## AT_LA3 0.09 0.25 0.34 0.19 0.14 0
## AT_SC6 0.17 0.23 0.23 0.12 0.25 0
## ST_SD1 0.19 0.20 0.24 0.12 0.25 0
## ST_SD2 0.18 0.19 0.24 0.16 0.23 0
## ST_SD3 0.19 0.21 0.24 0.16 0.21 0
## ST_SD4 0.15 0.19 0.39 0.15 0.12 0
Item Response Theory Output for MR1
df %>%
select (c(6, 7 ,8, 13, 14, 15, 16, 17, 19, 20, 21, 29, 30, 31, 32, 33)) %>%
irt.fa %T>% # T-pipe
plot(main = "C2C Aggregated Factor : MR1") %>%
print()## Item Response Analysis using Factor Analysis
##
## Call: irt.fa(x = .)
## Item Response Analysis using Factor Analysis
##
## Summary information by factor and item
## Factor = 1
## -3 -2 -1 0 1 2 3
## AT_PT1 0.35 0.79 1.06 1.17 1.16 0.78 0.29
## AT_PT2 0.35 0.86 1.12 1.22 1.29 0.98 0.38
## AT_PT3 0.69 1.49 1.17 1.76 2.00 1.87 0.64
## AT_WT5 0.20 0.36 0.54 0.63 0.62 0.49 0.30
## AT_CT1 0.22 0.44 0.66 0.78 0.77 0.61 0.35
## AT_CT2 0.25 0.55 0.81 0.91 0.91 0.74 0.40
## AT_CT3 0.31 1.02 1.54 1.70 1.50 0.74 0.18
## AT_CT4 0.12 0.15 0.17 0.18 0.18 0.16 0.13
## AT_LA1 0.30 0.57 0.80 0.86 0.77 0.52 0.25
## AT_LA2 0.17 0.22 0.26 0.27 0.26 0.22 0.17
## AT_LA3 0.26 0.41 0.54 0.60 0.55 0.41 0.24
## AT_SC6 0.57 1.23 1.32 1.61 1.95 1.11 0.22
## ST_SD1 0.57 1.48 1.55 1.47 2.05 1.44 0.28
## ST_SD2 1.26 1.40 1.85 1.26 2.02 1.96 0.49
## ST_SD3 0.72 1.48 1.53 1.42 1.83 1.68 0.49
## ST_SD4 0.24 0.43 0.60 0.67 0.63 0.48 0.29
## Test Info 6.57 12.88 15.51 16.51 18.49 14.18 5.12
## SEM 0.39 0.28 0.25 0.25 0.23 0.27 0.44
## Reliability 0.85 0.92 0.94 0.94 0.95 0.93 0.80
##
## Factor analysis with Call: fa(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate,
## fm = fm)
##
## Test of the hypothesis that 1 factor is sufficient.
## The degrees of freedom for the model is 104 and the objective function was 1.82
## The number of observations was 1924 with Chi Square = 3495.98 with prob < 0
##
## The root mean square of the residuals (RMSA) is 0.05
## The df corrected root mean square of the residuals is 0.06
##
## Tucker Lewis Index of factoring reliability = 0.873
## RMSEA index = 0.13 and the 10 % confidence intervals are 0.127 0.134
## BIC = 2709.51
Item Response Theory Output for MR2
df %>%
select (c(2, 10, 11, 12, 22)) %>%
irt.fa %T>% # T-pipe
plot(main = "C2C Aggregated Factor : MR2") %>%
print()## Item Response Analysis using Factor Analysis
##
## Call: irt.fa(x = .)
## Item Response Analysis using Factor Analysis
##
## Summary information by factor and item
## Factor = 1
## -3 -2 -1 0 1 2 3
## AT_WT2 0.19 0.35 0.52 0.62 0.61 0.49 0.32
## AT_WT3 0.25 0.49 0.68 0.74 0.72 0.61 0.40
## AT_WT4 0.22 0.52 0.79 0.84 0.83 0.78 0.56
## AT_LA4 0.18 0.30 0.42 0.49 0.47 0.38 0.26
## Test Info 0.85 1.66 2.41 2.69 2.63 2.26 1.53
## SEM 1.08 0.78 0.64 0.61 0.62 0.66 0.81
## Reliability -0.17 0.40 0.58 0.63 0.62 0.56 0.35
##
## Factor analysis with Call: fa(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate,
## fm = fm)
##
## Test of the hypothesis that 1 factor is sufficient.
## The degrees of freedom for the model is 5 and the objective function was 0.1
## The number of observations was 1924 with Chi Square = 193.02 with prob < 8.9e-40
##
## The root mean square of the residuals (RMSA) is 0.07
## The df corrected root mean square of the residuals is 0.1
##
## Tucker Lewis Index of factoring reliability = 0.846
## RMSEA index = 0.14 and the 10 % confidence intervals are 0.123 0.157
## BIC = 155.2
Item Response Theory Output for MR3
df %>%
select (c(9,18, 23, 25, 34, 35, 36, 37, 38, 39)) %>%
irt.fa %T>% # T-pipe
plot(main = "C2C Aggregated Factor : MR3") %>%
print()## Item Response Analysis using Factor Analysis
##
## Call: irt.fa(x = .)
## Item Response Analysis using Factor Analysis
##
## Summary information by factor and item
## Factor = 1
## -3 -2 -1 0 1 2 3
## AT_WT1 0.52 1.14 1.49 1.43 1.00 0.39 0.10
## AT_CT5 0.30 0.57 0.79 0.85 0.70 0.43 0.21
## AT_LA5 0.40 0.99 1.43 1.48 1.03 0.39 0.10
## AT_SC2 0.21 0.53 0.96 1.17 0.92 0.47 0.18
## ST_SD5 0.29 0.49 0.66 0.68 0.57 0.37 0.20
## ST_SR2 1.42 2.49 1.44 2.24 4.47 0.50 0.00
## ST_SR1 1.07 2.17 2.16 2.19 1.82 1.16 0.11
## ST_SR3 2.13 1.68 1.59 2.39 1.13 2.19 0.26
## ST_SR4 1.00 1.64 1.88 1.80 1.51 0.88 0.13
## ST_SR5 0.46 0.75 0.85 0.83 0.72 0.45 0.20
## Test Info 7.81 12.45 13.25 15.06 13.85 7.23 1.49
## SEM 0.36 0.28 0.27 0.26 0.27 0.37 0.82
## Reliability 0.87 0.92 0.92 0.93 0.93 0.86 0.33
##
## Factor analysis with Call: fa(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate,
## fm = fm)
##
## Test of the hypothesis that 1 factor is sufficient.
## The degrees of freedom for the model is 35 and the objective function was 0.76
## The number of observations was 1924 with Chi Square = 1452.62 with prob < 7.1e-283
##
## The root mean square of the residuals (RMSA) is 0.04
## The df corrected root mean square of the residuals is 0.04
##
## Tucker Lewis Index of factoring reliability = 0.917
## RMSEA index = 0.145 and the 10 % confidence intervals are 0.139 0.152
## BIC = 1187.94
Item Response Theory Output for MR4
df %>%
select (c(1, 3, 4, 5)) %>%
irt.fa %T>% # T-pipe
plot(main = "C2C Aggregated Factor : MR4") %>%
print()## Item Response Analysis using Factor Analysis
##
## Call: irt.fa(x = .)
## Item Response Analysis using Factor Analysis
##
## Summary information by factor and item
## Factor = 1
## -3 -2 -1 0 1 2 3
## NCSS_P1 0.35 0.58 0.75 0.77 0.63 0.38 0.18
## NCSS_P3 0.15 0.21 0.26 0.29 0.29 0.25 0.19
## NCSS_P4 0.37 0.84 1.04 1.09 1.16 1.02 0.51
## NCSS_P5 0.02 0.17 4.93 0.33 2.44 1.22 0.47
## Test Info 0.89 1.80 6.97 2.48 4.51 2.87 1.35
## SEM 1.06 0.75 0.38 0.63 0.47 0.59 0.86
## Reliability -0.12 0.44 0.86 0.60 0.78 0.65 0.26
##
## Factor analysis with Call: fa(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate,
## fm = fm)
##
## Test of the hypothesis that 1 factor is sufficient.
## The degrees of freedom for the model is 2 and the objective function was 0.01
## The number of observations was 1924 with Chi Square = 11.09 with prob < 0.0039
##
## The root mean square of the residuals (RMSA) is 0.01
## The df corrected root mean square of the residuals is 0.02
##
## Tucker Lewis Index of factoring reliability = 0.993
## RMSEA index = 0.049 and the 10 % confidence intervals are 0.023 0.078
## BIC = -4.04
Item Response Theory Output for MR5
df %>%
select (c(24, 26, 27, 28)) %>%
irt.fa %T>% # T-pipe
plot(main = "C2C Aggregated Factor : MR5") %>%
print()## Item Response Analysis using Factor Analysis
##
## Call: irt.fa(x = .)
## Item Response Analysis using Factor Analysis
##
## Summary information by factor and item
## Factor = 1
## -3 -2 -1 0 1 2 3
## AT_SC1 0.18 0.28 0.38 0.43 0.42 0.34 0.24
## AT_SC3 0.25 0.65 1.00 1.10 1.16 0.90 0.38
## AT_SC4 0.23 0.36 0.49 0.55 0.51 0.39 0.25
## AT_SC5 0.14 0.17 0.20 0.21 0.21 0.18 0.15
## Test Info 0.79 1.47 2.07 2.29 2.30 1.82 1.01
## SEM 1.12 0.82 0.70 0.66 0.66 0.74 0.99
## Reliability -0.26 0.32 0.52 0.56 0.57 0.45 0.01
##
## Factor analysis with Call: fa(r = r, nfactors = nfactors, n.obs = n.obs, rotate = rotate,
## fm = fm)
##
## Test of the hypothesis that 1 factor is sufficient.
## The degrees of freedom for the model is 2 and the objective function was 0.06
## The number of observations was 1924 with Chi Square = 106.46 with prob < 7.6e-24
##
## The root mean square of the residuals (RMSA) is 0.06
## The df corrected root mean square of the residuals is 0.1
##
## Tucker Lewis Index of factoring reliability = 0.794
## RMSEA index = 0.165 and the 10 % confidence intervals are 0.139 0.192
## BIC = 91.33
Define the factors for the CFA model
cfa.model <- 'MR1L =~ AT_PT1 + AT_PT2 + AT_PT3 + AT_WT5 + AT_CT1 + AT_CT2 + AT_CT3 + AT_CT4 + AT_LA1 + AT_LA2 + AT_LA3 + AT_SC6 + ST_SD1 + ST_SD2 + ST_SD3 + ST_SD4
MR2L =~ NCSS_P2 + AT_WT2 + AT_WT3 + AT_WT4 + AT_LA4
MR3L =~ AT_WT1 + AT_CT5 + AT_LA5 + AT_SC2 + ST_SD5 + ST_SR2 + ST_SR1 + ST_SR3 + ST_SR4 + ST_SR5
MR4L =~ NCSS_P1 + NCSS_P3 + NCSS_P4 + NCSS_P5
MR5L =~ AT_SC1 + AT_SC3 + AT_SC4 + AT_SC5'Fit the CFA Model
Check the results, quick check with “User Model versus Baseline Model,” which should have values above 0.9 or possibly 0.95
## lavaan 0.6-9 ended normally after 91 iterations
##
## Estimator ML
## Optimization method NLMINB
## Number of model parameters 88
##
## Number of observations 1924
##
## Model Test User Model:
##
## Test statistic 11636.334
## Degrees of freedom 692
## P-value (Chi-square) 0.000
##
## Model Test Baseline Model:
##
## Test statistic 60703.928
## Degrees of freedom 741
## P-value 0.000
##
## User Model versus Baseline Model:
##
## Comparative Fit Index (CFI) 0.817
## Tucker-Lewis Index (TLI) 0.805
##
## Loglikelihood and Information Criteria:
##
## Loglikelihood user model (H0) -101202.716
## Loglikelihood unrestricted model (H1) -95384.549
##
## Akaike (AIC) 202581.432
## Bayesian (BIC) 203070.902
## Sample-size adjusted Bayesian (BIC) 202791.325
##
## Root Mean Square Error of Approximation:
##
## RMSEA 0.091
## 90 Percent confidence interval - lower 0.089
## 90 Percent confidence interval - upper 0.092
## P-value RMSEA <= 0.05 0.000
##
## Standardized Root Mean Square Residual:
##
## SRMR 0.103
##
## Parameter Estimates:
##
## Standard errors Standard
## Information Expected
## Information saturated (h1) model Structured
##
## Latent Variables:
## Estimate Std.Err z-value P(>|z|)
## MR1L =~
## AT_PT1 1.000
## AT_PT2 1.018 0.025 40.916 0.000
## AT_PT3 1.220 0.026 47.354 0.000
## AT_WT5 0.693 0.025 28.136 0.000
## AT_CT1 0.756 0.024 31.508 0.000
## AT_CT2 0.814 0.024 34.155 0.000
## AT_CT3 1.188 0.028 42.885 0.000
## AT_CT4 0.385 0.026 14.690 0.000
## AT_LA1 0.823 0.025 32.675 0.000
## AT_LA2 0.424 0.024 17.989 0.000
## AT_LA3 0.640 0.024 26.744 0.000
## AT_SC6 1.194 0.026 45.286 0.000
## ST_SD1 1.238 0.026 47.311 0.000
## ST_SD2 1.239 0.025 48.806 0.000
## ST_SD3 1.206 0.025 47.491 0.000
## ST_SD4 0.714 0.024 29.903 0.000
## MR2L =~
## NCSS_P2 1.000
## AT_WT2 3.396 0.506 6.707 0.000
## AT_WT3 -3.831 0.564 -6.793 0.000
## AT_WT4 4.009 0.589 6.806 0.000
## AT_LA4 -3.900 0.578 -6.750 0.000
## MR3L =~
## AT_WT1 1.000
## AT_CT5 0.780 0.023 34.374 0.000
## AT_LA5 1.005 0.024 42.029 0.000
## AT_SC2 0.979 0.027 36.257 0.000
## ST_SD5 0.686 0.023 29.992 0.000
## ST_SR2 1.258 0.023 54.258 0.000
## ST_SR1 1.178 0.023 51.893 0.000
## ST_SR3 1.129 0.021 53.474 0.000
## ST_SR4 1.089 0.022 49.657 0.000
## ST_SR5 0.711 0.020 35.829 0.000
## MR4L =~
## NCSS_P1 1.000
## NCSS_P3 0.668 0.035 19.186 0.000
## NCSS_P4 1.155 0.035 33.049 0.000
## NCSS_P5 1.457 0.041 35.784 0.000
## MR5L =~
## AT_SC1 1.000
## AT_SC3 1.651 0.075 21.968 0.000
## AT_SC4 0.980 0.055 17.901 0.000
## AT_SC5 0.621 0.048 12.811 0.000
##
## Covariances:
## Estimate Std.Err z-value P(>|z|)
## MR1L ~~
## MR2L -0.037 0.008 -4.778 0.000
## MR3L 0.870 0.040 21.630 0.000
## MR4L 0.451 0.028 16.184 0.000
## MR5L 0.528 0.031 16.948 0.000
## MR2L ~~
## MR3L -0.141 0.022 -6.529 0.000
## MR4L -0.096 0.015 -6.401 0.000
## MR5L 0.003 0.004 0.866 0.386
## MR3L ~~
## MR4L 0.724 0.036 19.955 0.000
## MR5L 0.407 0.028 14.540 0.000
## MR4L ~~
## MR5L 0.189 0.018 10.273 0.000
##
## Variances:
## Estimate Std.Err z-value P(>|z|)
## .AT_PT1 0.640 0.022 29.436 0.000
## .AT_PT2 0.592 0.020 29.243 0.000
## .AT_PT3 0.391 0.014 27.147 0.000
## .AT_WT5 0.926 0.030 30.492 0.000
## .AT_CT1 0.802 0.026 30.297 0.000
## .AT_CT2 0.728 0.024 30.097 0.000
## .AT_CT3 0.652 0.023 28.823 0.000
## .AT_CT4 1.303 0.042 30.901 0.000
## .AT_LA1 0.854 0.028 30.215 0.000
## .AT_LA2 1.012 0.033 30.837 0.000
## .AT_LA3 0.901 0.029 30.557 0.000
## .AT_SC6 0.495 0.018 28.097 0.000
## .ST_SD1 0.405 0.015 27.171 0.000
## .ST_SD2 0.321 0.012 26.155 0.000
## .ST_SD3 0.374 0.014 27.068 0.000
## .ST_SD4 0.833 0.027 30.397 0.000
## .NCSS_P2 1.340 0.044 30.803 0.000
## .AT_WT2 0.903 0.033 27.301 0.000
## .AT_WT3 0.604 0.025 23.834 0.000
## .AT_WT4 0.564 0.025 22.587 0.000
## .AT_LA4 0.913 0.035 26.145 0.000
## .AT_WT1 0.641 0.022 28.872 0.000
## .AT_CT5 0.909 0.030 30.097 0.000
## .AT_LA5 0.792 0.027 29.262 0.000
## .AT_SC2 1.223 0.041 29.941 0.000
## .ST_SD5 1.017 0.033 30.383 0.000
## .ST_SR2 0.331 0.014 24.389 0.000
## .ST_SR1 0.403 0.015 26.242 0.000
## .ST_SR3 0.300 0.012 25.114 0.000
## .ST_SR4 0.447 0.016 27.351 0.000
## .ST_SR5 0.668 0.022 29.979 0.000
## .NCSS_P1 0.802 0.029 27.892 0.000
## .NCSS_P3 1.249 0.041 30.187 0.000
## .NCSS_P4 0.513 0.022 23.611 0.000
## .NCSS_P5 0.259 0.023 11.505 0.000
## .AT_SC1 1.044 0.037 28.204 0.000
## .AT_SC3 0.519 0.036 14.575 0.000
## .AT_SC4 0.988 0.035 28.159 0.000
## .AT_SC5 1.188 0.039 30.088 0.000
## MR1L 1.089 0.052 20.845 0.000
## MR2L 0.041 0.012 3.435 0.001
## MR3L 1.357 0.062 22.042 0.000
## MR4L 0.761 0.045 16.887 0.000
## MR5L 0.433 0.037 11.692 0.000
Congratulations !
You have completed session 1 of the S3729C Data Analytics Seminar.