I ran a few quick EFAs for the identity items and the advisor items, just to see how things were loading before I began playing with them. Mostly everything behaved as expected, notes are inserted in the analyses below.

Correlation Matrices

First step was running bunch of correlation matrices. This is using the data set with any and all NA’s ommitted, since the cor command doesn’t play well with missing data. Correlation plots are printed below.

corrplot(sid_cor, order = "hclust")

corrplot(eid_cor, order = "hclust")

corrplot(rid_cor, order = "hclust")

corrplot(adv_cor, order = "hclust")

corrplot(ibm_cor, order = "hclust")

corrplot(ibm_td_cor, order = "hclust")

corrplot(ibm_s_cor, order = "hclust")

corrplot(ibm_e_cor, order = "hclust")

corrplot(ibm_r_cor, order = "hclust")

SID EFA

Items loaded as expected, except for #5 (I want to be recognized for my contributions to SCIENCE). For future analyses, we might want to drop this item from scale calculations. Also might consider re-running this with #1 (Overall I see myself as) dropped – it’s loading on the recognition factor, but is on the low end.

##run efa for sid items
####NFACTORS####
ev <- eigen(cor(sid)) # get eigenvalues
ap <- parallel(subject=nrow(sid),var=ncol(sid),rep=100,cent=.05)
nS <- nScree(x=ev$values, aparallel=ap$eigen$qevpea)
plotnScree(nS)

nS
##   noc naf nparallel nkaiser
## 1   3   1         3       3
####EFA####
EFA <- factanal(sid, factors = 3, rotation = "promax", cutoff = 0.3)
print(EFA, digits=3, cutoff=.3, sort=TRUE)
## 
## Call:
## factanal(x = sid, factors = 3, rotation = "promax", cutoff = 0.3)
## 
## Uniquenesses:
##  SID1  SID2  SID3  SID4  SID5  SID6  SID7  SID8  SID9 SID10 SID11 SID12 
## 0.371 0.268 0.219 0.441 0.584 0.271 0.252 0.191 0.069 0.167 0.466 0.321 
## SID13 SID14 SID15 
## 0.341 0.444 0.316 
## 
## Loadings:
##       Factor1 Factor2 Factor3
## SID1   0.694                 
## SID2   0.898                 
## SID3   0.907                 
## SID4   0.749                 
## SID6   0.886                 
## SID7   0.898                 
## SID8           0.912         
## SID9           0.999         
## SID10          0.913         
## SID11                  0.519 
## SID12                  0.838 
## SID13                  0.811 
## SID14                  0.800 
## SID15                  0.821 
## SID5   0.362   0.417         
## 
##                Factor1 Factor2 Factor3
## SS loadings      4.414   2.964   2.952
## Proportion Var   0.294   0.198   0.197
## Cumulative Var   0.294   0.492   0.689
## 
## Factor Correlations:
##         Factor1 Factor2 Factor3
## Factor1   1.000   0.485   0.613
## Factor2   0.485   1.000   0.432
## Factor3   0.613   0.432   1.000
## 
## Test of the hypothesis that 3 factors are sufficient.
## The chi square statistic is 502.94 on 63 degrees of freedom.
## The p-value is 7.78e-70

EID EFA

First run – the two factor solution recommended by scree test did not behave as expected. Closer look reveals there are only two items on the interest factor – did we do this for a reason, or did an item get lost in the shuffle? Results in theinterest/pc items loading together in one factor, recognition items loading on their own as expected.

##run efa for eid items
####NFACTORS####
ev <- eigen(cor(eid)) # get eigenvalues
ap <- parallel(subject=nrow(eid),var=ncol(eid),rep=100,cent=.05)
nS <- nScree(x=ev$values, aparallel=ap$eigen$qevpea)
plotnScree(nS)

nS
##   noc naf nparallel nkaiser
## 1   2   1         2       2
####EFA####
EFA <- factanal(eid, factors = 2, rotation = "promax", cutoff = 0.3)
print(EFA, digits=3, cutoff=.3, sort=TRUE)
## 
## Call:
## factanal(x = eid, factors = 2, rotation = "promax", cutoff = 0.3)
## 
## Uniquenesses:
##  EID1  EID2  EID3  EID4  EID5  EID6  EID7  EID8  EID9 EID10 EID11 EID12 
## 0.360 0.338 0.395 0.430 0.501 0.318 0.275 0.387 0.428 0.281 0.255 0.412 
## EID13 EID14 
## 0.244 0.293 
## 
## Loadings:
##       Factor1 Factor2
## EID8   0.504   0.337 
## EID9   0.612         
## EID10  0.911         
## EID11  0.913         
## EID12  0.779         
## EID13  0.956         
## EID14  0.858         
## EID1           0.588 
## EID2           0.929 
## EID3           0.694 
## EID4           0.553 
## EID6           0.955 
## EID7           0.907 
## EID5           0.483 
## 
##                Factor1 Factor2
## SS loadings      4.843   4.143
## Proportion Var   0.346   0.296
## Cumulative Var   0.346   0.642
## 
## Factor Correlations:
##         Factor1 Factor2
## Factor1    1.00    0.72
## Factor2    0.72    1.00
## 
## Test of the hypothesis that 2 factors are sufficient.
## The chi square statistic is 1922.05 on 64 degrees of freedom.
## The p-value is 0

Re-ran as a three-factor and items behaved more as expected. Item #5 (I want to be recognized) loaded on interest factor, and #1 (I see myself as) cross-loads. Everything else behaves as expected – potential solution to dropped interest item?

EFA <- factanal(eid, factors = 3, rotation = "promax", cutoff = 0.3)
print(EFA, digits=3, cutoff=.3, sort=TRUE)
## 
## Call:
## factanal(x = eid, factors = 3, rotation = "promax", cutoff = 0.3)
## 
## Uniquenesses:
##  EID1  EID2  EID3  EID4  EID5  EID6  EID7  EID8  EID9 EID10 EID11 EID12 
## 0.339 0.297 0.396 0.432 0.420 0.296 0.274 0.119 0.184 0.289 0.255 0.391 
## EID13 EID14 
## 0.192 0.284 
## 
## Loadings:
##       Factor1 Factor2 Factor3
## EID10  0.774                 
## EID11  0.845                 
## EID12  0.770                 
## EID13  0.979                 
## EID14  0.807                 
## EID2           0.957         
## EID3           0.665         
## EID4           0.532         
## EID6           0.941         
## EID7           0.864         
## EID5                   0.596 
## EID8                   0.954 
## EID9                   0.920 
## EID1           0.431   0.402 
## 
##                Factor1 Factor2 Factor3
## SS loadings      3.632   3.560   2.336
## Proportion Var   0.259   0.254   0.167
## Cumulative Var   0.259   0.514   0.681
## 
## Factor Correlations:
##         Factor1 Factor2 Factor3
## Factor1   1.000   0.652  -0.728
## Factor2   0.652   1.000  -0.701
## Factor3  -0.728  -0.701   1.000
## 
## Test of the hypothesis that 3 factors are sufficient.
## The chi square statistic is 561.46 on 52 degrees of freedom.
## The p-value is 1.37e-86

RID EFA

First run – Heywood cases, cross loading. Dropped #1 (I see myself as) and re-ran.

##run efa for rid items
####NFACTORS####
ev <- eigen(cor(rid)) # get eigenvalues
ap <- parallel(subject=nrow(rid),var=ncol(rid),rep=100,cent=.05)
nS <- nScree(x=ev$values, aparallel=ap$eigen$qevpea)
plotnScree(nS)

nS
##   noc naf nparallel nkaiser
## 1   3   1         3       3
####EFA####
EFA <- factanal(rid, factors = 3, rotation = "promax", cutoff = 0.3)
print(EFA, digits=3, cutoff=.3, sort=TRUE)
## 
## Call:
## factanal(x = rid, factors = 3, rotation = "promax", cutoff = 0.3)
## 
## Uniquenesses:
##  RID1  RID2  RID3  RID4  RID5  RID6  RID7  RID8  RID9 RID10 RID11 RID12 
## 0.270 0.192 0.165 0.364 0.399 0.254 0.214 0.308 0.334 0.101 0.099 0.405 
## RID13 RID14 RID15 RID16 
## 0.315 0.449 0.357 0.350 
## 
## Loadings:
##       Factor1 Factor2 Factor3
## RID1   0.629                 
## RID2   1.003                 
## RID3   0.944                 
## RID4   0.673                 
## RID6   0.962                 
## RID7   0.861                 
## RID8           0.746         
## RID9           0.801         
## RID10          1.008         
## RID11          1.005         
## RID13                  0.753 
## RID14                  0.693 
## RID15                  0.891 
## RID16                  0.823 
## RID5   0.421   0.473         
## RID12                  0.449 
## 
##                Factor1 Factor2 Factor3
## SS loadings      4.742   3.563   2.785
## Proportion Var   0.296   0.223   0.174
## Cumulative Var   0.296   0.519   0.693
## 
## Factor Correlations:
##         Factor1 Factor2 Factor3
## Factor1   1.000   0.674   0.692
## Factor2   0.674   1.000   0.719
## Factor3   0.692   0.719   1.000
## 
## Test of the hypothesis that 3 factors are sufficient.
## The chi square statistic is 889.34 on 75 degrees of freedom.
## The p-value is 1.66e-139

Re-ran, loaded more or less as expected. Item #5 (I want to be recognized) is still cross-loading, and #12 (I can publish) is close to threshold.

rid2 <- subset(rid, select=c(2:16))
##run efa for rid items
####NFACTORS####
ev <- eigen(cor(rid2)) # get eigenvalues
ap <- parallel(subject=nrow(rid2),var=ncol(rid2),rep=100,cent=.05)
nS <- nScree(x=ev$values, aparallel=ap$eigen$qevpea)
plotnScree(nS)

nS
##   noc naf nparallel nkaiser
## 1   3   1         3       3
####EFA####
EFA <- factanal(rid2, factors = 3, rotation = "promax", cutoff = 0.3)
print(EFA, digits=3, cutoff=.3, sort=TRUE)
## 
## Call:
## factanal(x = rid2, factors = 3, rotation = "promax", cutoff = 0.3)
## 
## Uniquenesses:
##  RID2  RID3  RID4  RID5  RID6  RID7  RID8  RID9 RID10 RID11 RID12 RID13 
## 0.186 0.172 0.366 0.410 0.242 0.217 0.309 0.335 0.101 0.097 0.403 0.314 
## RID14 RID15 RID16 
## 0.449 0.358 0.350 
## 
## Loadings:
##       Factor1 Factor2 Factor3
## RID2   0.990                 
## RID3   0.918                 
## RID4   0.655                 
## RID6   0.957                 
## RID7   0.840                 
## RID8           0.742         
## RID9           0.795         
## RID10          0.998         
## RID11          0.996         
## RID13                  0.755 
## RID14                  0.695 
## RID15                  0.892 
## RID16                  0.830 
## RID5   0.402   0.479         
## RID12                  0.448 
## 
##                Factor1 Factor2 Factor3
## SS loadings      4.180   3.440   2.800
## Proportion Var   0.279   0.229   0.187
## Cumulative Var   0.279   0.508   0.695
## 
## Factor Correlations:
##         Factor1 Factor2 Factor3
## Factor1   1.000   0.652   0.690
## Factor2   0.652   1.000   0.714
## Factor3   0.690   0.714   1.000
## 
## Test of the hypothesis that 3 factors are sufficient.
## The chi square statistic is 790.46 on 63 degrees of freedom.
## The p-value is 2.66e-126

ADV EFA

All items loaded on one factor, good to go.

##run efa for adv items
####NFACTORS####
ev <- eigen(cor(adv)) # get eigenvalues
ap <- parallel(subject=nrow(adv),var=ncol(adv),rep=100,cent=.05)
nS <- nScree(x=ev$values, aparallel=ap$eigen$qevpea)
plotnScree(nS)

nS
##   noc naf nparallel nkaiser
## 1   1   1         1       1
####EFA####
EFA <- factanal(adv, factors = 1, rotation = "promax", cutoff = 0.3)
print(EFA, digits=3, cutoff=.3, sort=TRUE)
## 
## Call:
## factanal(x = adv, factors = 1, rotation = "promax", cutoff = 0.3)
## 
## Uniquenesses:
## EXP_ad1 EXP_ad2 EXP_ad3 EXP_ad4 EXP_ad5 EXP_ad6 EXP_ad7 EXP_ad8 
##   0.577   0.417   0.596   0.267   0.247   0.472   0.353   0.263 
## 
## Loadings:
## [1] 0.650 0.763 0.636 0.856 0.868 0.727 0.804 0.859
## 
##                Factor1
## SS loadings      4.807
## Proportion Var   0.601
## 
## Test of the hypothesis that 1 factor is sufficient.
## The chi square statistic is 361.4 on 20 degrees of freedom.
## The p-value is 1.99e-64

IBM Identities EFA

Suppressing the output for this one. Ran the three subscales (scientist, engineer, and researcher) seperately (so three different EFAs), all items loaded on single factors as expected.

IBM EFA

Ran EFA for all IBM identity items together. First run – three factor solution recommended by scree plot, all three ids load on their own factors. Student items no longer load independently, will try a four-factor solution and see what happens.

##run efa for ibm items together
####NFACTORS####
ev <- eigen(cor(ibm)) # get eigenvalues
ap <- parallel(subject=nrow(ibm),var=ncol(ibm),rep=100,cent=.05)
nS <- nScree(x=ev$values, aparallel=ap$eigen$qevpea)
plotnScree(nS)

nS
##   noc naf nparallel nkaiser
## 1   3   2         3       3
####EFA####
EFA <- factanal(ibm, factors = 3, rotation = "promax", cutoff = 0.3)
print(EFA, digits=3, cutoff=.3, sort=TRUE)
## 
## Call:
## factanal(x = ibm, factors = 3, rotation = "promax", cutoff = 0.3)
## 
## Uniquenesses:
## IBM_ja_s IBM_ja_e IBM_ja_r IBM_pr_s IBM_pr_e IBM_pr_r  IBM_c_s  IBM_c_e 
##    0.284    0.326    0.324    0.265    0.309    0.278    0.251    0.397 
##  IBM_c_r  IBM_h_s  IBM_h_e  IBM_h_r IBM_co_s IBM_co_e IBM_co_r IBM_ov_s 
##    0.269    0.273    0.331    0.362    0.261    0.287    0.269    0.541 
## IBM_ov_e IBM_ov_r 
##    0.655    0.780 
## 
## Loadings:
##          Factor1 Factor2 Factor3
## IBM_ja_s  0.856                 
## IBM_pr_s  0.838                 
## IBM_c_s   0.863                 
## IBM_h_s   0.858                 
## IBM_co_s  0.849                 
## IBM_ov_s  0.700                 
## IBM_ja_e          0.823         
## IBM_pr_e          0.833         
## IBM_c_e           0.769         
## IBM_h_e           0.820         
## IBM_co_e          0.846         
## IBM_ov_e          0.577         
## IBM_ja_r                  0.814 
## IBM_pr_r                  0.863 
## IBM_c_r                   0.856 
## IBM_h_r                   0.783 
## IBM_co_r                  0.863 
## IBM_ov_r                  0.322 
## 
##                Factor1 Factor2 Factor3
## SS loadings      4.154   3.741   3.623
## Proportion Var   0.231   0.208   0.201
## Cumulative Var   0.231   0.439   0.640
## 
## Factor Correlations:
##         Factor1 Factor2 Factor3
## Factor1  1.0000 -0.0719 -0.5531
## Factor2 -0.0719  1.0000  0.0887
## Factor3 -0.5531  0.0887  1.0000
## 
## Test of the hypothesis that 3 factors are sufficient.
## The chi square statistic is 1480.43 on 102 degrees of freedom.
## The p-value is 3.49e-243

Re-ran with four factors to see if student items load separately – nope! Might want to do a test for measurement invariance by year or milestone, to see if results from pilot can be replicated?

####EFA####
EFA <- factanal(ibm, factors = 4, rotation = "promax", cutoff = 0.3)
print(EFA, digits=3, cutoff=.3, sort=TRUE)
## 
## Call:
## factanal(x = ibm, factors = 4, rotation = "promax", cutoff = 0.3)
## 
## Uniquenesses:
## IBM_ja_s IBM_ja_e IBM_ja_r IBM_pr_s IBM_pr_e IBM_pr_r  IBM_c_s  IBM_c_e 
##    0.283    0.327    0.323    0.254    0.309    0.279    0.248    0.395 
##  IBM_c_r  IBM_h_s  IBM_h_e  IBM_h_r IBM_co_s IBM_co_e IBM_co_r IBM_ov_s 
##    0.272    0.279    0.329    0.360    0.263    0.285    0.268    0.005 
## IBM_ov_e IBM_ov_r 
##    0.630    0.647 
## 
## Loadings:
##          Factor1 Factor2 Factor3 Factor4
## IBM_ja_e  0.820                         
## IBM_pr_e  0.842                         
## IBM_c_e   0.785                         
## IBM_h_e   0.831                         
## IBM_co_e  0.859                         
## IBM_ov_e  0.530                         
## IBM_ja_r          0.822                 
## IBM_pr_r          0.865                 
## IBM_c_r           0.857                 
## IBM_h_r           0.792                 
## IBM_co_r          0.868                 
## IBM_ja_s                  0.810         
## IBM_pr_s                  0.844         
## IBM_c_s                   0.850         
## IBM_h_s                   0.810         
## IBM_co_s                  0.825         
## IBM_ov_s                          0.924 
## IBM_ov_r          0.308           0.438 
## 
##                Factor1 Factor2 Factor3 Factor4
## SS loadings      3.736   3.672   3.523   1.107
## Proportion Var   0.208   0.204   0.196   0.062
## Cumulative Var   0.208   0.412   0.607   0.669
## 
## Factor Correlations:
##           Factor1 Factor2   Factor3 Factor4
## Factor1  1.00e+00  0.5333 -7.85e-05   0.489
## Factor2  5.33e-01  1.0000 -8.56e-02   0.391
## Factor3 -7.85e-05 -0.0856  1.00e+00  -0.279
## Factor4  4.89e-01  0.3909 -2.79e-01   1.000
## 
## Test of the hypothesis that 4 factors are sufficient.
## The chi square statistic is 1120.67 on 87 degrees of freedom.
## The p-value is 3.39e-179

IBM Task Difficulty EFA

First run – two factor solution recommended by scree plot. Reading research, attending conferences, attending classes, completing course/homework, and being a TA load together; writing, presenting, and collaborating load together; and conducting research crossloads on both factors, at or below threshold. Results hard to interpret, re-running with three factors.

##run efa for ibm_td items
####NFACTORS####
ev <- eigen(cor(ibm_td)) # get eigenvalues
ap <- parallel(subject=nrow(ibm_td),var=ncol(ibm_td),rep=100,cent=.05)
nS <- nScree(x=ev$values, aparallel=ap$eigen$qevpea)
plotnScree(nS)

nS
##   noc naf nparallel nkaiser
## 1   2   2         2       2
####EFA####
EFA <- factanal(ibm_td, factors = 2, rotation = "promax", cutoff = 0.3)
print(EFA, digits=3, cutoff=.3, sort=TRUE)
## 
## Call:
## factanal(x = ibm_td, factors = 2, rotation = "promax", cutoff = 0.3)
## 
## Uniquenesses:
## IBM_td1 IBM_td2 IBM_td3 IBM_td4 IBM_td5 IBM_td6 IBM_td7 IBM_td8 IBM_td9 
##   0.489   0.546   0.772   0.268   0.244   0.655   0.366   0.481   0.746 
## 
## Loadings:
##         Factor1 Factor2
## IBM_td1  0.690         
## IBM_td4  0.859         
## IBM_td6  0.577         
## IBM_td7  0.811         
## IBM_td9  0.514         
## IBM_td2          0.661 
## IBM_td5          0.872 
## IBM_td8          0.735 
## IBM_td3  0.424         
## 
##                Factor1 Factor2
## SS loadings      2.671   1.790
## Proportion Var   0.297   0.199
## Cumulative Var   0.297   0.496
## 
## Factor Correlations:
##         Factor1 Factor2
## Factor1   1.000  -0.199
## Factor2  -0.199   1.000
## 
## Test of the hypothesis that 2 factors are sufficient.
## The chi square statistic is 1141.36 on 19 degrees of freedom.
## The p-value is 3.29e-230

Tried with three factors. Item loadings are a bit cleaner, but there is an issue with one Heywood case (#4).

Rreading research, attending conferences, and completing course/home work load together; writing, presenting, and collaborating load together; and attending class, being a TA, and conducting research load together. These factors are a bit more interpertable than the last ones, check the literature to see how to handle the Heywood case? Maybe also talk to Dr. Meade.

####EFA####
EFA <- factanal(ibm_td, factors = 3, rotation = "promax", cutoff = 0.3)
print(EFA, digits=3, cutoff=.3, sort=TRUE)
## 
## Call:
## factanal(x = ibm_td, factors = 3, rotation = "promax", cutoff = 0.3)
## 
## Uniquenesses:
## IBM_td1 IBM_td2 IBM_td3 IBM_td4 IBM_td5 IBM_td6 IBM_td7 IBM_td8 IBM_td9 
##   0.541   0.547   0.712   0.105   0.245   0.437   0.402   0.484   0.354 
## 
## Loadings:
##         Factor1 Factor2 Factor3
## IBM_td1  0.628                 
## IBM_td4  1.017                 
## IBM_td7  0.723                 
## IBM_td2          0.666         
## IBM_td5          0.873         
## IBM_td8          0.734         
## IBM_td6                  0.664 
## IBM_td9                  0.913 
## IBM_td3                  0.459 
## 
##                Factor1 Factor2 Factor3
## SS loadings      2.000   1.805   1.515
## Proportion Var   0.222   0.201   0.168
## Cumulative Var   0.222   0.423   0.591
## 
## Factor Correlations:
##         Factor1 Factor2 Factor3
## Factor1   1.000  -0.184   0.665
## Factor2  -0.184   1.000  -0.194
## Factor3   0.665  -0.194   1.000
## 
## Test of the hypothesis that 3 factors are sufficient.
## The chi square statistic is 682.98 on 12 degrees of freedom.
## The p-value is 1.94e-138