I ran a few quick EFAs for the identity items and the advisor items, just to see how things were loading before I began playing with them. Mostly everything behaved as expected, notes are inserted in the analyses below.
First step was running bunch of correlation matrices. This is using the data set with any and all NA’s ommitted, since the cor command doesn’t play well with missing data. Correlation plots are printed below.
corrplot(sid_cor, order = "hclust")
corrplot(eid_cor, order = "hclust")
corrplot(rid_cor, order = "hclust")
corrplot(adv_cor, order = "hclust")
corrplot(ibm_cor, order = "hclust")
corrplot(ibm_td_cor, order = "hclust")
corrplot(ibm_s_cor, order = "hclust")
corrplot(ibm_e_cor, order = "hclust")
corrplot(ibm_r_cor, order = "hclust")
Items loaded as expected, except for #5 (I want to be recognized for my contributions to SCIENCE). For future analyses, we might want to drop this item from scale calculations. Also might consider re-running this with #1 (Overall I see myself as) dropped – it’s loading on the recognition factor, but is on the low end.
##run efa for sid items
####NFACTORS####
ev <- eigen(cor(sid)) # get eigenvalues
ap <- parallel(subject=nrow(sid),var=ncol(sid),rep=100,cent=.05)
nS <- nScree(x=ev$values, aparallel=ap$eigen$qevpea)
plotnScree(nS)
nS
## noc naf nparallel nkaiser
## 1 3 1 3 3
####EFA####
EFA <- factanal(sid, factors = 3, rotation = "promax", cutoff = 0.3)
print(EFA, digits=3, cutoff=.3, sort=TRUE)
##
## Call:
## factanal(x = sid, factors = 3, rotation = "promax", cutoff = 0.3)
##
## Uniquenesses:
## SID1 SID2 SID3 SID4 SID5 SID6 SID7 SID8 SID9 SID10 SID11 SID12
## 0.371 0.268 0.219 0.441 0.584 0.271 0.252 0.191 0.069 0.167 0.466 0.321
## SID13 SID14 SID15
## 0.341 0.444 0.316
##
## Loadings:
## Factor1 Factor2 Factor3
## SID1 0.694
## SID2 0.898
## SID3 0.907
## SID4 0.749
## SID6 0.886
## SID7 0.898
## SID8 0.912
## SID9 0.999
## SID10 0.913
## SID11 0.519
## SID12 0.838
## SID13 0.811
## SID14 0.800
## SID15 0.821
## SID5 0.362 0.417
##
## Factor1 Factor2 Factor3
## SS loadings 4.414 2.964 2.952
## Proportion Var 0.294 0.198 0.197
## Cumulative Var 0.294 0.492 0.689
##
## Factor Correlations:
## Factor1 Factor2 Factor3
## Factor1 1.000 0.485 0.613
## Factor2 0.485 1.000 0.432
## Factor3 0.613 0.432 1.000
##
## Test of the hypothesis that 3 factors are sufficient.
## The chi square statistic is 502.94 on 63 degrees of freedom.
## The p-value is 7.78e-70
First run – the two factor solution recommended by scree test did not behave as expected. Closer look reveals there are only two items on the interest factor – did we do this for a reason, or did an item get lost in the shuffle? Results in theinterest/pc items loading together in one factor, recognition items loading on their own as expected.
##run efa for eid items
####NFACTORS####
ev <- eigen(cor(eid)) # get eigenvalues
ap <- parallel(subject=nrow(eid),var=ncol(eid),rep=100,cent=.05)
nS <- nScree(x=ev$values, aparallel=ap$eigen$qevpea)
plotnScree(nS)
nS
## noc naf nparallel nkaiser
## 1 2 1 2 2
####EFA####
EFA <- factanal(eid, factors = 2, rotation = "promax", cutoff = 0.3)
print(EFA, digits=3, cutoff=.3, sort=TRUE)
##
## Call:
## factanal(x = eid, factors = 2, rotation = "promax", cutoff = 0.3)
##
## Uniquenesses:
## EID1 EID2 EID3 EID4 EID5 EID6 EID7 EID8 EID9 EID10 EID11 EID12
## 0.360 0.338 0.395 0.430 0.501 0.318 0.275 0.387 0.428 0.281 0.255 0.412
## EID13 EID14
## 0.244 0.293
##
## Loadings:
## Factor1 Factor2
## EID8 0.504 0.337
## EID9 0.612
## EID10 0.911
## EID11 0.913
## EID12 0.779
## EID13 0.956
## EID14 0.858
## EID1 0.588
## EID2 0.929
## EID3 0.694
## EID4 0.553
## EID6 0.955
## EID7 0.907
## EID5 0.483
##
## Factor1 Factor2
## SS loadings 4.843 4.143
## Proportion Var 0.346 0.296
## Cumulative Var 0.346 0.642
##
## Factor Correlations:
## Factor1 Factor2
## Factor1 1.00 0.72
## Factor2 0.72 1.00
##
## Test of the hypothesis that 2 factors are sufficient.
## The chi square statistic is 1922.05 on 64 degrees of freedom.
## The p-value is 0
Re-ran as a three-factor and items behaved more as expected. Item #5 (I want to be recognized) loaded on interest factor, and #1 (I see myself as) cross-loads. Everything else behaves as expected – potential solution to dropped interest item?
EFA <- factanal(eid, factors = 3, rotation = "promax", cutoff = 0.3)
print(EFA, digits=3, cutoff=.3, sort=TRUE)
##
## Call:
## factanal(x = eid, factors = 3, rotation = "promax", cutoff = 0.3)
##
## Uniquenesses:
## EID1 EID2 EID3 EID4 EID5 EID6 EID7 EID8 EID9 EID10 EID11 EID12
## 0.339 0.297 0.396 0.432 0.420 0.296 0.274 0.119 0.184 0.289 0.255 0.391
## EID13 EID14
## 0.192 0.284
##
## Loadings:
## Factor1 Factor2 Factor3
## EID10 0.774
## EID11 0.845
## EID12 0.770
## EID13 0.979
## EID14 0.807
## EID2 0.957
## EID3 0.665
## EID4 0.532
## EID6 0.941
## EID7 0.864
## EID5 0.596
## EID8 0.954
## EID9 0.920
## EID1 0.431 0.402
##
## Factor1 Factor2 Factor3
## SS loadings 3.632 3.560 2.336
## Proportion Var 0.259 0.254 0.167
## Cumulative Var 0.259 0.514 0.681
##
## Factor Correlations:
## Factor1 Factor2 Factor3
## Factor1 1.000 0.652 -0.728
## Factor2 0.652 1.000 -0.701
## Factor3 -0.728 -0.701 1.000
##
## Test of the hypothesis that 3 factors are sufficient.
## The chi square statistic is 561.46 on 52 degrees of freedom.
## The p-value is 1.37e-86
First run – Heywood cases, cross loading. Dropped #1 (I see myself as) and re-ran.
##run efa for rid items
####NFACTORS####
ev <- eigen(cor(rid)) # get eigenvalues
ap <- parallel(subject=nrow(rid),var=ncol(rid),rep=100,cent=.05)
nS <- nScree(x=ev$values, aparallel=ap$eigen$qevpea)
plotnScree(nS)
nS
## noc naf nparallel nkaiser
## 1 3 1 3 3
####EFA####
EFA <- factanal(rid, factors = 3, rotation = "promax", cutoff = 0.3)
print(EFA, digits=3, cutoff=.3, sort=TRUE)
##
## Call:
## factanal(x = rid, factors = 3, rotation = "promax", cutoff = 0.3)
##
## Uniquenesses:
## RID1 RID2 RID3 RID4 RID5 RID6 RID7 RID8 RID9 RID10 RID11 RID12
## 0.270 0.192 0.165 0.364 0.399 0.254 0.214 0.308 0.334 0.101 0.099 0.405
## RID13 RID14 RID15 RID16
## 0.315 0.449 0.357 0.350
##
## Loadings:
## Factor1 Factor2 Factor3
## RID1 0.629
## RID2 1.003
## RID3 0.944
## RID4 0.673
## RID6 0.962
## RID7 0.861
## RID8 0.746
## RID9 0.801
## RID10 1.008
## RID11 1.005
## RID13 0.753
## RID14 0.693
## RID15 0.891
## RID16 0.823
## RID5 0.421 0.473
## RID12 0.449
##
## Factor1 Factor2 Factor3
## SS loadings 4.742 3.563 2.785
## Proportion Var 0.296 0.223 0.174
## Cumulative Var 0.296 0.519 0.693
##
## Factor Correlations:
## Factor1 Factor2 Factor3
## Factor1 1.000 0.674 0.692
## Factor2 0.674 1.000 0.719
## Factor3 0.692 0.719 1.000
##
## Test of the hypothesis that 3 factors are sufficient.
## The chi square statistic is 889.34 on 75 degrees of freedom.
## The p-value is 1.66e-139
Re-ran, loaded more or less as expected. Item #5 (I want to be recognized) is still cross-loading, and #12 (I can publish) is close to threshold.
rid2 <- subset(rid, select=c(2:16))
##run efa for rid items
####NFACTORS####
ev <- eigen(cor(rid2)) # get eigenvalues
ap <- parallel(subject=nrow(rid2),var=ncol(rid2),rep=100,cent=.05)
nS <- nScree(x=ev$values, aparallel=ap$eigen$qevpea)
plotnScree(nS)
nS
## noc naf nparallel nkaiser
## 1 3 1 3 3
####EFA####
EFA <- factanal(rid2, factors = 3, rotation = "promax", cutoff = 0.3)
print(EFA, digits=3, cutoff=.3, sort=TRUE)
##
## Call:
## factanal(x = rid2, factors = 3, rotation = "promax", cutoff = 0.3)
##
## Uniquenesses:
## RID2 RID3 RID4 RID5 RID6 RID7 RID8 RID9 RID10 RID11 RID12 RID13
## 0.186 0.172 0.366 0.410 0.242 0.217 0.309 0.335 0.101 0.097 0.403 0.314
## RID14 RID15 RID16
## 0.449 0.358 0.350
##
## Loadings:
## Factor1 Factor2 Factor3
## RID2 0.990
## RID3 0.918
## RID4 0.655
## RID6 0.957
## RID7 0.840
## RID8 0.742
## RID9 0.795
## RID10 0.998
## RID11 0.996
## RID13 0.755
## RID14 0.695
## RID15 0.892
## RID16 0.830
## RID5 0.402 0.479
## RID12 0.448
##
## Factor1 Factor2 Factor3
## SS loadings 4.180 3.440 2.800
## Proportion Var 0.279 0.229 0.187
## Cumulative Var 0.279 0.508 0.695
##
## Factor Correlations:
## Factor1 Factor2 Factor3
## Factor1 1.000 0.652 0.690
## Factor2 0.652 1.000 0.714
## Factor3 0.690 0.714 1.000
##
## Test of the hypothesis that 3 factors are sufficient.
## The chi square statistic is 790.46 on 63 degrees of freedom.
## The p-value is 2.66e-126
All items loaded on one factor, good to go.
##run efa for adv items
####NFACTORS####
ev <- eigen(cor(adv)) # get eigenvalues
ap <- parallel(subject=nrow(adv),var=ncol(adv),rep=100,cent=.05)
nS <- nScree(x=ev$values, aparallel=ap$eigen$qevpea)
plotnScree(nS)
nS
## noc naf nparallel nkaiser
## 1 1 1 1 1
####EFA####
EFA <- factanal(adv, factors = 1, rotation = "promax", cutoff = 0.3)
print(EFA, digits=3, cutoff=.3, sort=TRUE)
##
## Call:
## factanal(x = adv, factors = 1, rotation = "promax", cutoff = 0.3)
##
## Uniquenesses:
## EXP_ad1 EXP_ad2 EXP_ad3 EXP_ad4 EXP_ad5 EXP_ad6 EXP_ad7 EXP_ad8
## 0.577 0.417 0.596 0.267 0.247 0.472 0.353 0.263
##
## Loadings:
## [1] 0.650 0.763 0.636 0.856 0.868 0.727 0.804 0.859
##
## Factor1
## SS loadings 4.807
## Proportion Var 0.601
##
## Test of the hypothesis that 1 factor is sufficient.
## The chi square statistic is 361.4 on 20 degrees of freedom.
## The p-value is 1.99e-64
Suppressing the output for this one. Ran the three subscales (scientist, engineer, and researcher) seperately (so three different EFAs), all items loaded on single factors as expected.
Ran EFA for all IBM identity items together. First run – three factor solution recommended by scree plot, all three ids load on their own factors. Student items no longer load independently, will try a four-factor solution and see what happens.
##run efa for ibm items together
####NFACTORS####
ev <- eigen(cor(ibm)) # get eigenvalues
ap <- parallel(subject=nrow(ibm),var=ncol(ibm),rep=100,cent=.05)
nS <- nScree(x=ev$values, aparallel=ap$eigen$qevpea)
plotnScree(nS)
nS
## noc naf nparallel nkaiser
## 1 3 2 3 3
####EFA####
EFA <- factanal(ibm, factors = 3, rotation = "promax", cutoff = 0.3)
print(EFA, digits=3, cutoff=.3, sort=TRUE)
##
## Call:
## factanal(x = ibm, factors = 3, rotation = "promax", cutoff = 0.3)
##
## Uniquenesses:
## IBM_ja_s IBM_ja_e IBM_ja_r IBM_pr_s IBM_pr_e IBM_pr_r IBM_c_s IBM_c_e
## 0.284 0.326 0.324 0.265 0.309 0.278 0.251 0.397
## IBM_c_r IBM_h_s IBM_h_e IBM_h_r IBM_co_s IBM_co_e IBM_co_r IBM_ov_s
## 0.269 0.273 0.331 0.362 0.261 0.287 0.269 0.541
## IBM_ov_e IBM_ov_r
## 0.655 0.780
##
## Loadings:
## Factor1 Factor2 Factor3
## IBM_ja_s 0.856
## IBM_pr_s 0.838
## IBM_c_s 0.863
## IBM_h_s 0.858
## IBM_co_s 0.849
## IBM_ov_s 0.700
## IBM_ja_e 0.823
## IBM_pr_e 0.833
## IBM_c_e 0.769
## IBM_h_e 0.820
## IBM_co_e 0.846
## IBM_ov_e 0.577
## IBM_ja_r 0.814
## IBM_pr_r 0.863
## IBM_c_r 0.856
## IBM_h_r 0.783
## IBM_co_r 0.863
## IBM_ov_r 0.322
##
## Factor1 Factor2 Factor3
## SS loadings 4.154 3.741 3.623
## Proportion Var 0.231 0.208 0.201
## Cumulative Var 0.231 0.439 0.640
##
## Factor Correlations:
## Factor1 Factor2 Factor3
## Factor1 1.0000 -0.0719 -0.5531
## Factor2 -0.0719 1.0000 0.0887
## Factor3 -0.5531 0.0887 1.0000
##
## Test of the hypothesis that 3 factors are sufficient.
## The chi square statistic is 1480.43 on 102 degrees of freedom.
## The p-value is 3.49e-243
Re-ran with four factors to see if student items load separately – nope! Might want to do a test for measurement invariance by year or milestone, to see if results from pilot can be replicated?
####EFA####
EFA <- factanal(ibm, factors = 4, rotation = "promax", cutoff = 0.3)
print(EFA, digits=3, cutoff=.3, sort=TRUE)
##
## Call:
## factanal(x = ibm, factors = 4, rotation = "promax", cutoff = 0.3)
##
## Uniquenesses:
## IBM_ja_s IBM_ja_e IBM_ja_r IBM_pr_s IBM_pr_e IBM_pr_r IBM_c_s IBM_c_e
## 0.283 0.327 0.323 0.254 0.309 0.279 0.248 0.395
## IBM_c_r IBM_h_s IBM_h_e IBM_h_r IBM_co_s IBM_co_e IBM_co_r IBM_ov_s
## 0.272 0.279 0.329 0.360 0.263 0.285 0.268 0.005
## IBM_ov_e IBM_ov_r
## 0.630 0.647
##
## Loadings:
## Factor1 Factor2 Factor3 Factor4
## IBM_ja_e 0.820
## IBM_pr_e 0.842
## IBM_c_e 0.785
## IBM_h_e 0.831
## IBM_co_e 0.859
## IBM_ov_e 0.530
## IBM_ja_r 0.822
## IBM_pr_r 0.865
## IBM_c_r 0.857
## IBM_h_r 0.792
## IBM_co_r 0.868
## IBM_ja_s 0.810
## IBM_pr_s 0.844
## IBM_c_s 0.850
## IBM_h_s 0.810
## IBM_co_s 0.825
## IBM_ov_s 0.924
## IBM_ov_r 0.308 0.438
##
## Factor1 Factor2 Factor3 Factor4
## SS loadings 3.736 3.672 3.523 1.107
## Proportion Var 0.208 0.204 0.196 0.062
## Cumulative Var 0.208 0.412 0.607 0.669
##
## Factor Correlations:
## Factor1 Factor2 Factor3 Factor4
## Factor1 1.00e+00 0.5333 -7.85e-05 0.489
## Factor2 5.33e-01 1.0000 -8.56e-02 0.391
## Factor3 -7.85e-05 -0.0856 1.00e+00 -0.279
## Factor4 4.89e-01 0.3909 -2.79e-01 1.000
##
## Test of the hypothesis that 4 factors are sufficient.
## The chi square statistic is 1120.67 on 87 degrees of freedom.
## The p-value is 3.39e-179
First run – two factor solution recommended by scree plot. Reading research, attending conferences, attending classes, completing course/homework, and being a TA load together; writing, presenting, and collaborating load together; and conducting research crossloads on both factors, at or below threshold. Results hard to interpret, re-running with three factors.
##run efa for ibm_td items
####NFACTORS####
ev <- eigen(cor(ibm_td)) # get eigenvalues
ap <- parallel(subject=nrow(ibm_td),var=ncol(ibm_td),rep=100,cent=.05)
nS <- nScree(x=ev$values, aparallel=ap$eigen$qevpea)
plotnScree(nS)
nS
## noc naf nparallel nkaiser
## 1 2 2 2 2
####EFA####
EFA <- factanal(ibm_td, factors = 2, rotation = "promax", cutoff = 0.3)
print(EFA, digits=3, cutoff=.3, sort=TRUE)
##
## Call:
## factanal(x = ibm_td, factors = 2, rotation = "promax", cutoff = 0.3)
##
## Uniquenesses:
## IBM_td1 IBM_td2 IBM_td3 IBM_td4 IBM_td5 IBM_td6 IBM_td7 IBM_td8 IBM_td9
## 0.489 0.546 0.772 0.268 0.244 0.655 0.366 0.481 0.746
##
## Loadings:
## Factor1 Factor2
## IBM_td1 0.690
## IBM_td4 0.859
## IBM_td6 0.577
## IBM_td7 0.811
## IBM_td9 0.514
## IBM_td2 0.661
## IBM_td5 0.872
## IBM_td8 0.735
## IBM_td3 0.424
##
## Factor1 Factor2
## SS loadings 2.671 1.790
## Proportion Var 0.297 0.199
## Cumulative Var 0.297 0.496
##
## Factor Correlations:
## Factor1 Factor2
## Factor1 1.000 -0.199
## Factor2 -0.199 1.000
##
## Test of the hypothesis that 2 factors are sufficient.
## The chi square statistic is 1141.36 on 19 degrees of freedom.
## The p-value is 3.29e-230
Tried with three factors. Item loadings are a bit cleaner, but there is an issue with one Heywood case (#4).
Rreading research, attending conferences, and completing course/home work load together; writing, presenting, and collaborating load together; and attending class, being a TA, and conducting research load together. These factors are a bit more interpertable than the last ones, check the literature to see how to handle the Heywood case? Maybe also talk to Dr. Meade.
####EFA####
EFA <- factanal(ibm_td, factors = 3, rotation = "promax", cutoff = 0.3)
print(EFA, digits=3, cutoff=.3, sort=TRUE)
##
## Call:
## factanal(x = ibm_td, factors = 3, rotation = "promax", cutoff = 0.3)
##
## Uniquenesses:
## IBM_td1 IBM_td2 IBM_td3 IBM_td4 IBM_td5 IBM_td6 IBM_td7 IBM_td8 IBM_td9
## 0.541 0.547 0.712 0.105 0.245 0.437 0.402 0.484 0.354
##
## Loadings:
## Factor1 Factor2 Factor3
## IBM_td1 0.628
## IBM_td4 1.017
## IBM_td7 0.723
## IBM_td2 0.666
## IBM_td5 0.873
## IBM_td8 0.734
## IBM_td6 0.664
## IBM_td9 0.913
## IBM_td3 0.459
##
## Factor1 Factor2 Factor3
## SS loadings 2.000 1.805 1.515
## Proportion Var 0.222 0.201 0.168
## Cumulative Var 0.222 0.423 0.591
##
## Factor Correlations:
## Factor1 Factor2 Factor3
## Factor1 1.000 -0.184 0.665
## Factor2 -0.184 1.000 -0.194
## Factor3 0.665 -0.194 1.000
##
## Test of the hypothesis that 3 factors are sufficient.
## The chi square statistic is 682.98 on 12 degrees of freedom.
## The p-value is 1.94e-138