Confirmatory Factor Analysis for EPSY 906 (SEM)

Checking Missing Packages and Install and/or Load them

list.of.packages <- c("psych", "psychometric", "sjstats", "semPlot", "lavaan", "ggplot2", "reshape2")

new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)
lapply(list.of.packages, require, character.only=TRUE)

## [[1]]
## [1] TRUE
## 
## [[2]]
## [1] TRUE
## 
## [[3]]
## [1] TRUE
## 
## [[4]]
## [1] TRUE
## 
## [[5]]
## [1] TRUE
## 
## [[6]]
## [1] TRUE
## 
## [[7]]
## [1] TRUE

Importing the Data

dat <- read.table(file.choose(), header = TRUE)
head(dat)

##     Lab  HW PopQuiz Exam1 Exam2 FinalExam
## 1  9.95 9.4    8.13    72    90        62
## 2  7.71 0.0    8.00    63    26        78
## 3 10.00 9.5    8.13    79    55        80
## 4  9.95 9.4    8.88    89    77        79
## 5  9.95 9.2    8.50    83    84        67
## 6 10.00 9.9    9.63    96    94        93

Begin by summarizing the construct(s) being measured and the items themselves, including how many there are and their response options. Also provide your sample size and briefly describe the sample. Provide all relevant modeling info: program, estimator, how each model was identified, how models will be compared, and what criteria you are using to indicate “good fit” (i.e., cut-off values) both globally and locally. The idea is that a reader should be able to replicate your analyses given the information provided. (1 point)

Data was taken from Rencher & Christensen, 2012, chapter 14. This data set includes data from a university business statistics class for N = 94 students. Scores were obtained for laboratory assignments (lab), homework assignments (HW), pop quizzes (PopQuiz), midterm exams #1 and #2 (Exam1 and Exam2) as well as the final exam (FinalExam). There are two hypothesis to be tested here:

All these items load on one factor “Statistics Skills.”
There are two factors being estimated in the data. Items lab, HW, and PopQuiz load on the first factor “daily effort” and the remaining items load on another factor “knowledge mastery”. Each model will be estimated and then a Chi square test will be used to see if there is a significant difference between the two and whether adding another factor significantly fit the data better. Model fit indices; such as CFI, TLI, RMSEA, SRMR; were used to evaluate the overall model fit of each model.

Provide and reference a table of descriptive statistics for your items, including columns for N, mean, standard deviation, minimum, maximum, and item-total correlation. Also provide and reference a Pearson correlation matrix for your items (organized by dimension if your indicators belong to more than a single dimension). Note that you should be able to paste output directly into excel in order to make these tables easily (i.e., no typing numbers). Comment on the difficulty and discrimination of your items using these CTT results. Also comment on the size and heterogeneity of your inter-item correlations. (2 points)

Summary Statistics

#summary(dat)
correlation <- item.exam(dat)

DataSummary<- data.frame(describe(dat))
DataSummary["Item.total"] <- correlation$Item.total
cbind(DataSummary[,c(2,3,4,8,9,14)], correlation$Difficulty, correlation$Item.Reliab)

##            n      mean         sd   min    max Item.total
## Lab       94  9.637766  0.9736553  2.00  10.00  0.3818438
## HW        94  8.815000  1.7782544  0.00   9.92  0.5273897
## PopQuiz   94  7.899681  1.5235425  1.25  10.00  0.5056022
## Exam1     94 79.127660 12.7988861 44.00 100.00  0.8302881
## Exam2     94 70.053191 19.6995535  0.00 100.00  0.8919697
## FinalExam 94 74.351064 14.3307380 31.00 100.00  0.8466486
##           correlation$Difficulty correlation$Item.Reliab
## Lab                     9.637766               0.3698013
## HW                      8.815000               0.9328313
## PopQuiz                 7.899681               0.7661981
## Exam1                  79.127660              10.5700857
## Exam2                  70.053191              17.4776896
## FinalExam              74.351064              12.0683895

#correlation$Difficulty
#correlation$Item.Reliab
#DataSummary$
#round(DataSummary[,c(2,3,4,8,9,14)], digits = 3)

Looking at the table above, the LAB and HW seem to be easy for students and most of the students are scoring very highly in them resulting in a difficulty of 9.64 for the LAB and 8.82 for the HW where the max value is 10. On the other hand, the PopQuiz seem to be more difficult for the student as the difficulty=7.9 where the max=10. Finally, the midterm exams as well as the final exam seem to be the most challenging with difficulty=79.13, 70.05, and 74.35 for Exam1, Exaxm2, and FinalExam, respectively. The following histograms will show also how the results of Lab and HW are negatively skewed, while the other four items are approximating normality.

Data Distribution

par(mfcol = c(2, 3)) 
for (i in 1:6) {

  hist(dat[ , i], main = paste("Plot for", names(dat)[i]), col="orange")
}

##Data Correlation

cormatrix<- round(cor(dat, method = "pearson"), digits = 3)
upper<-cormatrix
upper[upper.tri(cormatrix)]<-""
upper<-as.data.frame(upper)
upper

##             Lab    HW PopQuiz Exam1 Exam2 FinalExam
## Lab           1                                    
## HW         0.65     1                              
## PopQuiz   0.143 0.373       1                      
## Exam1     0.251 0.353   0.361     1                
## Exam2     0.301 0.501   0.445 0.599     1          
## FinalExam 0.328 0.349     0.4 0.641   0.6         1

The inter-item correlation atrix shown above shows that the correlation between the items range from weak (r=.143) to medium (r=.251) to high (r=.65).

Estimate a CFA model that corresponds to your hypothesized dimensionality. Report the relevant fit statistics and describe by which indices good fit has been achieved globally. Provide the range of effect sizes across indicators (i.e., standardized loadings). Examine and describe any local misfit. If your model fit is not adequate, considering its sources of local misfit, re-specify your model to try to improve fit. Note that any model modifications should also be theoretically defensible, so provide a rationale for these modifications. Describe the model modification process you followed, and conduct any relevant model comparisons to support your modifications. (5 points)

I started the analysis of the model with only one factor where all the items load on it. MLR estimation method was used for all analyses as well as a zscored factor model identification where the factor mean was set to zero and factor vriance to one. The results of the model shown below were calculated using Lavaan R Package. The fit indeces, comparative fir index (CFI) and Tucker-Lewis Index (TLI), are computed using ratios of the chi-square model and the chi-square null model while at the same time taking into account their degrees of freedom. All of these indices have values that range between approximately 0 and 1.0, and the more they approach 1 the better with a cutoff score of approximately .95. With that said, according to our model below, the CFI for this model is .809 and the TLI is .681, which indicates a poor model global fit. Additionally, the Root mean square error of approximation (RMSEA) ranges from 0 to 1, with smaller values indicating better model fit and a value of .05 or less is indicative of acceptable model fit. Similarly, the RMSEA for this model is .211 and the SMRM is .086, which confirm the poor fit indces of the CFI and TLI. Furhtermore, the normailezed residual matrix reveals that there is no local misfit. Accordingly, we reject the first hypothesis that all items load on only one factor.

model0 <- "
  StatistcsSkill =~ Lab+HW + PopQuiz + Exam1 + Exam2 + FinalExam
  Lab~1; HW~1; PopQuiz~1; Exam1~1; Exam2~1; FinalExam~1;
  Lab~~Lab; HW~~HW; PopQuiz~~PopQuiz; Exam1~~Exam1; Exam2~~Exam2; FinalExam~~FinalExam;
  
StatistcsSkill~0
StatistcsSkill~~StatistcsSkill
"
model0.fit <- lavaan(model = model0, data = dat, estimator = "MLR", mimic = "mplus", std.lv = TRUE)
summary(model0.fit, fit.measures = TRUE, rsquare = TRUE, standardized = TRUE)

## lavaan (0.5-23.1097) converged normally after  66 iterations
## 
##   Number of observations                            94
## 
##   Number of missing patterns                         1
## 
##   Estimator                                         ML      Robust
##   Minimum Function Test Statistic               49.207      38.488
##   Degrees of freedom                                 9           9
##   P-value (Chi-square)                           0.000       0.000
##   Scaling correction factor                                  1.278
##     for the Yuan-Bentler correction (Mplus variant)
## 
## Model test baseline model:
## 
##   Minimum Function Test Statistic              217.586     164.568
##   Degrees of freedom                                15          15
##   P-value                                        0.000       0.000
## 
## User model versus baseline model:
## 
##   Comparative Fit Index (CFI)                    0.802       0.803
##   Tucker-Lewis Index (TLI)                       0.669       0.671
## 
##   Robust Comparative Fit Index (CFI)                         0.809
##   Robust Tucker-Lewis Index (TLI)                            0.682
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -1574.335   -1574.335
##   Scaling correction factor                                  2.761
##     for the MLR correction
##   Loglikelihood unrestricted model (H1)      -1549.732   -1549.732
##   Scaling correction factor                                  2.267
##     for the MLR correction
## 
##   Number of free parameters                         18          18
##   Akaike (AIC)                                3184.670    3184.670
##   Bayesian (BIC)                              3230.450    3230.450
##   Sample-size adjusted Bayesian (BIC)         3173.624    3173.624
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.218       0.187
##   90 Percent Confidence Interval          0.161  0.279       0.135  0.242
##   P-value RMSEA <= 0.05                          0.000       0.000
## 
##   Robust RMSEA                                               0.211
##   90 Percent Confidence Interval                             0.145  0.282
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.086       0.086
## 
## Parameter Estimates:
## 
##   Information                                 Observed
##   Standard Errors                   Robust.huber.white
## 
## Latent Variables:
##                     Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   StatistcsSkill =~                                                      
##     Lab                0.432    0.233    1.854    0.064    0.432    0.446
##     HW                 1.043    0.329    3.170    0.002    1.043    0.590
##     PopQuiz            0.799    0.219    3.641    0.000    0.799    0.527
##     Exam1              9.465    1.215    7.792    0.000    9.465    0.743
##     Exam2             15.678    2.485    6.309    0.000   15.678    0.800
##     FinalExam         10.874    1.609    6.758    0.000   10.874    0.763
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .Lab               9.638    0.100   96.485    0.000    9.638    9.952
##    .HW                8.815    0.182   48.319    0.000    8.815    4.984
##    .PopQuiz           7.900    0.156   50.541    0.000    7.900    5.213
##    .Exam1            79.128    1.313   60.262    0.000   79.128    6.216
##    .Exam2            70.053    2.021   34.662    0.000   70.053    3.575
##    .FinalExam        74.351    1.470   50.571    0.000   74.351    5.216
##     StatistcsSkill    0.000                               0.000    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .Lab               0.751    0.446    1.683    0.092    0.751    0.801
##    .HW                2.041    0.647    3.152    0.002    2.041    0.652
##    .PopQuiz           1.658    0.476    3.481    0.000    1.658    0.722
##    .Exam1            72.480   14.968    4.842    0.000   72.480    0.447
##    .Exam2           138.140   31.282    4.416    0.000  138.140    0.360
##    .FinalExam        84.934   20.827    4.078    0.000   84.934    0.418
##     StatistcsSkill    1.000                               1.000    1.000
## 
## R-Square:
##                    Estimate
##     Lab               0.199
##     HW                0.348
##     PopQuiz           0.278
##     Exam1             0.553
##     Exam2             0.640
##     FinalExam         0.582

resid(model0.fit, type="normalized")

## $type
## [1] "normalized"
## 
## $cov
##           Lab    HW     PopQuz Exam1  Exam2  FnlExm
## Lab        0.000                                   
## HW         0.982  0.000                            
## PopQuiz   -0.689  0.363  0.000                     
## Exam1     -0.637 -0.659 -0.238  0.000              
## Exam2     -0.342  0.151  0.141  0.031  0.000       
## FinalExam -0.055 -0.714 -0.025  0.681 -0.082  0.000
## 
## $mean
##       Lab        HW   PopQuiz     Exam1     Exam2 FinalExam 
##         0         0         0         0         0         0

modificationindices(object= model0.fit, sort.=TRUE)

##        lhs op       rhs     mi mi.scaled     epc sepc.lv sepc.all sepc.nox
## 21     Lab ~~        HW 31.804    24.876   0.782   0.782    0.457    0.457
## 34   Exam1 ~~ FinalExam  7.401     5.789  36.113  36.113    0.199    0.199
## 29      HW ~~ FinalExam  6.040     4.724  -4.372  -4.372   -0.173   -0.173
## 27      HW ~~     Exam1  3.764     2.944  -3.094  -3.094   -0.137   -0.137
## 23     Lab ~~     Exam1  2.425     1.897  -1.413  -1.413   -0.115   -0.115
## 24     Lab ~~     Exam2  1.731     1.354  -1.790  -1.790   -0.094   -0.094
## 22     Lab ~~   PopQuiz  1.591     1.244  -0.155  -0.155   -0.106   -0.106
## 26      HW ~~   PopQuiz  0.944     0.738   0.204   0.204    0.076    0.076
## 28      HW ~~     Exam2  0.671     0.525   2.001   2.001    0.058    0.058
## 30 PopQuiz ~~     Exam1  0.424     0.332  -0.904  -0.904   -0.047   -0.047
## 31 PopQuiz ~~     Exam2  0.350     0.274   1.244   1.244    0.042    0.042
## 35   Exam2 ~~ FinalExam  0.247     0.193 -10.704 -10.704   -0.038   -0.038
## 25     Lab ~~ FinalExam  0.067     0.052  -0.259  -0.259   -0.019   -0.019
## 33   Exam1 ~~     Exam2  0.036     0.028   3.554   3.554    0.014    0.014
## 32 PopQuiz ~~ FinalExam  0.003     0.002  -0.078  -0.078   -0.004   -0.004

semPaths(object = model0.fit, what = "std")

#The following is the two-factor model Since the one factor model did not fit the data well, in the following model I ran the two factor model. The overlall model fit improved marginally but the fit indeces (CFI & TLI) are still below the cutoff scores.

## lavaan (0.5-23.1097) converged normally after 138 iterations
## 
##   Number of observations                            94
## 
##   Number of missing patterns                         1
## 
##   Estimator                                         ML      Robust
##   Minimum Function Test Statistic               45.966      36.590
##   Degrees of freedom                                 9           9
##   P-value (Chi-square)                           0.000       0.000
##   Scaling correction factor                                  1.256
##     for the Yuan-Bentler correction (Mplus variant)
## 
## Model test baseline model:
## 
##   Minimum Function Test Statistic              217.586     164.568
##   Degrees of freedom                                15          15
##   P-value                                        0.000       0.000
## 
## User model versus baseline model:
## 
##   Comparative Fit Index (CFI)                    0.818       0.816
##   Tucker-Lewis Index (TLI)                       0.696       0.693
## 
##   Robust Comparative Fit Index (CFI)                         0.825
##   Robust Tucker-Lewis Index (TLI)                            0.708
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -1572.715   -1572.715
##   Scaling correction factor                                  2.772
##     for the MLR correction
##   Loglikelihood unrestricted model (H1)      -1549.732   -1549.732
##   Scaling correction factor                                  2.267
##     for the MLR correction
## 
##   Number of free parameters                         18          18
##   Akaike (AIC)                                3181.429    3181.429
##   Bayesian (BIC)                              3227.209    3227.209
##   Sample-size adjusted Bayesian (BIC)         3170.383    3170.383
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.209       0.181
##   90 Percent Confidence Interval          0.152  0.271       0.128  0.237
##   P-value RMSEA <= 0.05                          0.000       0.000
## 
##   Robust RMSEA                                               0.202
##   90 Percent Confidence Interval                             0.137  0.273
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.215       0.215
## 
## Parameter Estimates:
## 
##   Information                                 Observed
##   Standard Errors                   Robust.huber.white
## 
## Latent Variables:
##                       Estimate  Std.Err  z-value  P(>|z|)   Std.lv
##   DailyEffort =~                                                  
##     Lab                  0.483    0.232    2.083    0.037    0.483
##     HW                   2.306    1.000    2.305    0.021    2.306
##     PopQuiz              0.434    0.300    1.446    0.148    0.434
##   KnowledgeMastery =~                                             
##     Exam1               10.186    1.134    8.983    0.000   10.186
##     Exam2               14.673    2.188    6.707    0.000   14.673
##     FinalExam           11.423    1.388    8.229    0.000   11.423
##   Std.all
##          
##     0.499
##     1.303
##     0.286
##          
##     0.800
##     0.749
##     0.801
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .Lab               9.638    0.100   96.484    0.000    9.638    9.952
##    .HW                8.815    0.182   48.319    0.000    8.815    4.984
##    .PopQuiz           7.900    0.156   50.541    0.000    7.900    5.213
##    .Exam1            79.128    1.313   60.262    0.000   79.128    6.216
##    .Exam2            70.053    2.021   34.662    0.000   70.053    3.575
##    .FinalExam        74.351    1.470   50.571    0.000   74.351    5.216
##     DailyEffort       0.000                               0.000    0.000
##     KnowledgeMstry    0.000                               0.000    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .Lab               0.705    0.515    1.368    0.171    0.705    0.751
##    .HW               -2.187    4.163   -0.525    0.599   -2.187   -0.699
##    .PopQuiz           2.108    0.568    3.711    0.000    2.108    0.918
##    .Exam1            58.318   14.197    4.108    0.000   58.318    0.360
##    .Exam2           168.634   37.779    4.464    0.000  168.634    0.439
##    .FinalExam        72.709   17.902    4.062    0.000   72.709    0.358
##     DailyEffort       1.000                               1.000    1.000
##     KnowledgeMstry    1.000                               1.000    1.000
## 
## R-Square:
##                    Estimate
##     Lab               0.249
##     HW                   NA
##     PopQuiz           0.082
##     Exam1             0.640
##     Exam2             0.561
##     FinalExam         0.642

semPaths(object = model1.fit, what = "std")

###Check the difference between the two models. I then compared both models to see if the two model provides a better fit than the one factor model. According to the Chi Square Different Test, the better alternative is to choose the model with the two factors.

anova(model0.fit,model1.fit)

## Scaled Chi Square Difference Test (method = "satorra.bentler.2001")
## 
##            Df    AIC    BIC  Chisq Chisq diff Df diff Pr(>Chisq)
## model0.fit  9 3184.7 3230.4 49.207                              
## model1.fit  9 3181.4 3227.2 45.966                  0

Final Model

Before a final decision was made, I returned to check the the range of effect sizes across indicators seem to have some problems in both models. In the one-factor model, Lab does not significantly load on the factor, while Lab as well as PopQuiz do not load significantly in the DailyEffort factor they were assigned to. This might be due to the fact that the method of measuring the HW is different from measuring tests. Similarly, the PopQuiz is different from Lab and HW due to the fact these quizzes are unannounced and are influenced by other variables, such as anxiety, mode etc. According to this finding, I decided to re-run the first model with only one factor and removing the Lab since it is not a significant item. Surprisingly, the model fit indeces as well as the absolute model fit indeces showed that the model fit the data PERFECTLY. Also, the test statistic showed that the model cannot be improved any better. Below, we can see that the fit indeces CFI=1, TLI=1, RMSEA=0, and SRMR=0.036, which are all meet and exceed the requirements of their cutoff scores.

FinalModel <- "
  StatistcsSkill =~ HW + PopQuiz + Exam1 + Exam2 + FinalExam
  HW~1; PopQuiz~1; Exam1~1; Exam2~1; FinalExam~1;
  HW~~HW; PopQuiz~~PopQuiz; Exam1~~Exam1; Exam2~~Exam2; FinalExam~~FinalExam;
  
StatistcsSkill~0
StatistcsSkill~~StatistcsSkill
"
FinalModel.Fit <- lavaan(model = FinalModel, data = dat, estimator = "MLR", mimic = "mplus", std.lv = TRUE)
summary(FinalModel.Fit, fit.measures = TRUE, rsquare = TRUE, standardized = TRUE)

## lavaan (0.5-23.1097) converged normally after  47 iterations
## 
##   Number of observations                            94
## 
##   Number of missing patterns                         1
## 
##   Estimator                                         ML      Robust
##   Minimum Function Test Statistic                7.568       4.187
##   Degrees of freedom                                 5           5
##   P-value (Chi-square)                           0.182       0.523
##   Scaling correction factor                                  1.808
##     for the Yuan-Bentler correction (Mplus variant)
## 
## Model test baseline model:
## 
##   Minimum Function Test Statistic              159.326     105.317
##   Degrees of freedom                                10          10
##   P-value                                        0.000       0.000
## 
## User model versus baseline model:
## 
##   Comparative Fit Index (CFI)                    0.983       1.000
##   Tucker-Lewis Index (TLI)                       0.966       1.017
## 
##   Robust Comparative Fit Index (CFI)                         1.000
##   Robust Tucker-Lewis Index (TLI)                            1.020
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)              -1452.278   -1452.278
##   Scaling correction factor                                  1.711
##     for the MLR correction
##   Loglikelihood unrestricted model (H1)      -1448.493   -1448.493
##   Scaling correction factor                                  1.735
##     for the MLR correction
## 
##   Number of free parameters                         15          15
##   Akaike (AIC)                                2934.555    2934.555
##   Bayesian (BIC)                              2972.705    2972.705
##   Sample-size adjusted Bayesian (BIC)         2925.350    2925.350
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.074       0.000
##   90 Percent Confidence Interval          0.000  0.174       0.000  0.102
##   P-value RMSEA <= 0.05                          0.295       0.736
## 
##   Robust RMSEA                                               0.000
##   90 Percent Confidence Interval                             0.000  0.176
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.036       0.036
## 
## Parameter Estimates:
## 
##   Information                                 Observed
##   Standard Errors                   Robust.huber.white
## 
## Latent Variables:
##                     Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##   StatistcsSkill =~                                                      
##     HW                 0.950    0.293    3.241    0.001    0.950    0.537
##     PopQuiz            0.808    0.211    3.827    0.000    0.808    0.533
##     Exam1              9.678    1.193    8.112    0.000    9.678    0.760
##     Exam2             15.798    2.498    6.324    0.000   15.798    0.806
##     FinalExam         10.960    1.472    7.445    0.000   10.960    0.769
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .HW                8.815    0.182   48.319    0.000    8.815    4.984
##    .PopQuiz           7.900    0.156   50.541    0.000    7.900    5.213
##    .Exam1            79.128    1.313   60.262    0.000   79.128    6.216
##    .Exam2            70.053    2.021   34.662    0.000   70.053    3.575
##    .FinalExam        74.351    1.470   50.571    0.000   74.351    5.216
##     StatistcsSkill    0.000                               0.000    0.000
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .HW                2.225    0.717    3.104    0.002    2.225    0.711
##    .PopQuiz           1.643    0.471    3.486    0.000    1.643    0.715
##    .Exam1            68.409   14.317    4.778    0.000   68.409    0.422
##    .Exam2           134.356   31.640    4.246    0.000  134.356    0.350
##    .FinalExam        83.054   18.492    4.491    0.000   83.054    0.409
##     StatistcsSkill    1.000                               1.000    1.000
## 
## R-Square:
##                    Estimate
##     HW                0.289
##     PopQuiz           0.285
##     Exam1             0.578
##     Exam2             0.650
##     FinalExam         0.591

resid(FinalModel.Fit, type="normalized")

## $type
## [1] "normalized"
## 
## $cov
##           HW     PopQuz Exam1  Exam2  FnlExm
## HW         0.000                            
## PopQuiz    0.504  0.000                     
## Exam1     -0.427 -0.341  0.000              
## Exam2      0.352  0.091 -0.101  0.000       
## FinalExam -0.455 -0.111  0.522 -0.158  0.000
## 
## $mean
##        HW   PopQuiz     Exam1     Exam2 FinalExam 
##         0         0         0         0         0

modificationindices(object= FinalModel.Fit, sort.=TRUE)

##        lhs op       rhs    mi mi.scaled     epc sepc.lv sepc.all sepc.nox
## 26   Exam1 ~~ FinalExam 6.175     3.416  36.323  36.323    0.200    0.200
## 20      HW ~~     Exam2 3.373     1.866   4.579   4.579    0.132    0.132
## 21      HW ~~ FinalExam 2.240     1.239  -2.715  -2.715   -0.108   -0.108
## 18      HW ~~   PopQuiz 1.648     0.912   0.277   0.277    0.103    0.103
## 19      HW ~~     Exam1 1.543     0.853  -2.015  -2.015   -0.089   -0.089
## 27   Exam2 ~~ FinalExam 1.264     0.699 -27.037 -27.037   -0.097   -0.097
## 22 PopQuiz ~~     Exam1 1.001     0.554  -1.392  -1.392   -0.072   -0.072
## 25   Exam1 ~~     Exam2 0.553     0.306 -15.748 -15.748   -0.063   -0.063
## 23 PopQuiz ~~     Exam2 0.161     0.089   0.857   0.857    0.029    0.029
## 24 PopQuiz ~~ FinalExam 0.056     0.031  -0.368  -0.368   -0.017   -0.017

semPaths(object = FinalModel.Fit, what = "std")

> 4. Once your fit is as good as it is going to get and your model is still theoretically defensible, you can call it a final model. Estimate omega for each dimension. Provide and reference a table of ALL estimate model parameters, including columns for unstandardized estimates, their standard errors, and standardized estimates. Use the “text to columns” feature in the Data menu of Excel to make this easier, but make sure each parameter is clearly labeled (i.e., do not leave the impoverished labels used by Mplus). (2 points)

After doing the analysis, Omega=0.834, which basically says that reliabilities for sum scores is pretty good.

#Calculating Omege

#In words, Omega = Var(Factor)* (Sum of loadings)^2 / [ Var(Factor)* (Sum of loadings)^2 + Sum of error variances + 2* Sum of error covariances]



OmegaSyntax = "

StatistcsSkill =~ L1*HW + L2*PopQuiz + L3*Exam1 + L4*Exam2 + L5*FinalExam
  HW~1; PopQuiz~1; Exam1~1; Exam2~1; FinalExam~1;
  HW~~E1*HW; PopQuiz~~E2*PopQuiz; Exam1~~E3*Exam1; Exam2~~E4*Exam2; FinalExam~~E5*FinalExam;
  
OmegaGPS := ((L1 + L2 + L3 + L4 + L5 )^2) / (((L1 + L2 + L3 + L4 + L5)^2) + E1 + E2 + E3 + E4 + E5  )


StatistcsSkill~0
StatistcsSkill~~StatistcsSkill

"
FinalModelOmega = sem(model = OmegaSyntax, data = dat, estimator = "MLR", mimic = "mplus", std.lv = TRUE)
summary(FinalModelOmega, fit.measures = FALSE, rsquare = FALSE, standardized = FALSE, header = FALSE)

## 
## Parameter Estimates:
## 
##   Information                                 Observed
##   Standard Errors                   Robust.huber.white
## 
## Latent Variables:
##                     Estimate  Std.Err  z-value  P(>|z|)
##   StatistcsSkill =~                                    
##     HW        (L1)     0.950    0.293    3.241    0.001
##     PopQuiz   (L2)     0.808    0.211    3.827    0.000
##     Exam1     (L3)     9.678    1.193    8.112    0.000
##     Exam2     (L4)    15.798    2.498    6.324    0.000
##     FinalExam (L5)    10.960    1.472    7.445    0.000
## 
## Intercepts:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .HW                8.815    0.182   48.319    0.000
##    .PopQuiz           7.900    0.156   50.541    0.000
##    .Exam1            79.128    1.313   60.262    0.000
##    .Exam2            70.053    2.021   34.662    0.000
##    .FinalExam        74.351    1.470   50.571    0.000
##     StatistcsSkill    0.000                           
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)
##    .HW        (E1)    2.225    0.717    3.104    0.002
##    .PopQuiz   (E2)    1.643    0.471    3.486    0.000
##    .Exam1     (E3)   68.409   14.317    4.778    0.000
##    .Exam2     (E4)  134.356   31.640    4.246    0.000
##    .FinalExam (E5)   83.054   18.492    4.491    0.000
##     SttstcsSk         1.000                           
## 
## Defined Parameters:
##                    Estimate  Std.Err  z-value  P(>|z|)
##     OmegaGPS          0.834    0.027   31.337    0.000

Provide and reference a histogram of your sample’s factor score distribution. Use the function I created within Example04. Also provide and reference a factor-response plot that shows the predicted indicator responses at ±2 SD of the factor for the two items with the lowest and highest CFA difficulty (see the Example 4 R syntax for help). Comment on how plausible a linear model predicting the indicator responses from your factor is for your data.(3 points)

##                StatistcsSkill
## StatistcsSkill      0.8546961

## [1] -9.499456e-17

The above histogram shows that the majority of the students did well in the course and received good grades.

sumscoresP = apply(X = dat, MARGIN = 1, FUN = sum)
fscoresPI = FactorScores

plot(x = FactorScores, y = sumscoresP, xlab = "StatistcsSkill: Full CFA Factor Scores", ylab = "StatistcsSkill: Sum Score", xlim = c(-3.5, 2))
text(x = -2, y = 20, labels = paste("r =", as.character(round(x = cor(FactorScores, sumscoresP), digits = 3))))

Model-Predicted Item Responses by Factor Scores with Dashed Lines for Floor and Ceiling Effects

lavObject = FinalModel.Fit
cfaPlots = function(lavObject){
  output = inspect(object = lavObject, what = "est")
  
  #build matrix plot elements
  
  #get factor scores
  fscores = FactorScores
  
  nfactors = 1
  
  #get max observed data
  itemMax = max(apply(X = lavObject@Data@X[[1]], MARGIN = 2, FUN = max))
  itemMin = min(apply(X = lavObject@Data@X[[1]], MARGIN = 2, FUN = min))
    
  #get range for all scores
  factorMax = max(apply(X = fscores,MARGIN = 2, FUN = max))
  factorMin = min(apply(X = fscores, MARGIN = 2, FUN = min))
  
  #set up x values
  x = seq(factorMin, factorMax, .01)
  
  par(mfrow = c(1, nfactors))
  #make plots by factor
  factor=1
  for (factor in 1:nfactors){
    xmat = NULL
    ymat = NULL
    inames = NULL
    for (item in 1:nrow(output$lambda)){
      if (output$lambda[item, factor] != 0){
        inames = c(inames, rownames(output$lambda)[item])
        y = output$nu[item] + output$lambda[item, factor]*x
        xmat = cbind(xmat, x)
        ymat = cbind(ymat, y)
      }
      
    }
    matplot(x = xmat, y = ymat, type = "l", lwd = 5, lty=2:(ncol(xmat)+1), ylim = c(itemMin-1, itemMax+1), xlim = c(factorMin, factorMax),  
            ylab = "Predicted Item Response", xlab = colnames(output$lambda)[factor], col = 2:(ncol(xmat)+1)) 
    lines(x = c(factorMin,factorMax), y = c(itemMin, itemMin), lty = 3, lwd = 5)
    lines(x = c(factorMin,factorMax), y = c(itemMax, itemMax), lty = 3, lwd = 5)
    legend(x = -3, y = 7, legend = inames, lty = 2:(ncol(xmat)+1), lwd = 5, col = 2:(ncol(xmat)+1))
  }
  par(mfrow = c(1,1))
}
cfaPlots(lavObject = FinalModel.Fit)