Revisit how were are thinking about engagement and interest. For now, let’s just look at a correlation matrix between the ESM variables concentration, hard work, enjoyment, and interest. As all of these could potentially be in an engagement measure, let’s look at what the measure would look like with just conc & work vs. conc, work and interest, vs conc work & enj. Vs. all 4. Also, is there any rationale for putting interest and enjoyment together and having this separate from the conc & work pair??? Here I just want us to understand our data better . What happens when all these are put in a factor analysis?
Presently, here’s how our engagement measures are made:
ss <- esm %>%
dplyr::select(concentrating, hard_working, enjoy, interest)
ss %>%
correlate() %>%
shave() %>%
fashion()
## rowname concentrating hard_working enjoy interest
## 1 concentrating
## 2 hard_working .62
## 3 enjoy .59 .57
## 4 interest .57 .54 .66
This seems to suggest there is one factor (two does not seem to fit):
Note that MR1
represents the factor loadings
determine_n_factors <- function(d) {
require(nFactors)
d <- d[complete.cases(d), ]
ev <- eigen(cor(d)) # get eigenvalues
ap <- parallel(subject=nrow(d),var=ncol(d),
rep=100,cent=.05)
nS <- nScree(x=ev$values, aparallel=ap$eigen$qevpea)
plotnScree(nS)
}
determine_n_factors(ss)
ssc <- ss[complete.cases(ss), ]
library(psych)
fit1 <- fa(ssc, nfactors=1, rotation="Promax", fm = "pa")
fit1
## Factor Analysis using method = pa
## Call: fa(r = ssc, nfactors = 1, fm = "pa", rotation = "Promax")
## Standardized loadings (pattern matrix) based upon correlation matrix
## PA1 h2 u2 com
## concentrating 0.78 0.60 0.40 1
## hard_working 0.74 0.55 0.45 1
## enjoy 0.80 0.64 0.36 1
## interest 0.77 0.59 0.41 1
##
## PA1
## SS loadings 2.37
## Proportion Var 0.59
##
## Mean item complexity = 1
## Test of the hypothesis that 1 factor is sufficient.
##
## The degrees of freedom for the null model are 6 and the objective function was 1.72 with Chi Square of 5073.69
## The degrees of freedom for the model are 2 and the objective function was 0.04
##
## The root mean square of the residuals (RMSR) is 0.03
## The df corrected root mean square of the residuals is 0.06
##
## The harmonic number of observations is 2957 with the empirical chi square 40.13 with prob < 1.9e-09
## The total number of observations was 2957 with Likelihood Chi Square = 113.95 with prob < 1.8e-25
##
## Tucker Lewis Index of factoring reliability = 0.934
## RMSEA index = 0.138 and the 90 % confidence intervals are 0.117 0.16
## BIC = 97.97
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy
## PA1
## Correlation of (regression) scores with factors 0.92
## Multiple R square of scores with factors 0.85
## Minimum correlation of possible factor scores 0.71
# fit2 <- fa(ssc, nfactors=2, rotation="Promax", fm = "pa")
# fit2
psych::alpha(ssc)
##
## Reliability analysis
## Call: psych::alpha(x = ssc)
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd
## 0.85 0.85 0.82 0.59 5.8 0.0044 2.9 0.87
##
## lower alpha upper 95% confidence boundaries
## 0.84 0.85 0.86
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se
## concentrating 0.81 0.81 0.75 0.59 4.3 0.0060
## hard_working 0.82 0.82 0.76 0.61 4.7 0.0056
## enjoy 0.80 0.80 0.74 0.58 4.1 0.0063
## interest 0.81 0.82 0.75 0.60 4.4 0.0059
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## concentrating 2957 0.83 0.84 0.76 0.70 2.9 1.0
## hard_working 2957 0.82 0.82 0.73 0.67 2.9 1.0
## enjoy 2957 0.85 0.85 0.78 0.72 2.8 1.1
## interest 2957 0.83 0.83 0.75 0.69 2.9 1.1
##
## Non missing response frequency for each item
## 1 2 3 4 miss
## concentrating 0.12 0.2 0.33 0.35 0
## hard_working 0.14 0.2 0.31 0.35 0
## enjoy 0.15 0.2 0.31 0.33 0
## interest 0.14 0.2 0.30 0.37 0
ss1 <- esm %>%
dplyr::select(concentrating, hard_working, enjoy, interest, important, future_goals)
ss1 %>%
correlate() %>%
shave() %>%
fashion()
## rowname concentrating hard_working enjoy interest important
## 1 concentrating
## 2 hard_working .62
## 3 enjoy .59 .57
## 4 interest .57 .54 .66
## 5 important .50 .49 .54 .56
## 6 future_goals .41 .44 .49 .51 .62
## future_goals
## 1
## 2
## 3
## 4
## 5
## 6
determine_n_factors(ss1)
ssc <- ss1[complete.cases(ss1), ]
library(psych)
fit1 <- fa(ssc, nfactors=1, rotation="Promax", fm = "pa")
fit1
## Factor Analysis using method = pa
## Call: fa(r = ssc, nfactors = 1, fm = "pa", rotation = "Promax")
## Standardized loadings (pattern matrix) based upon correlation matrix
## PA1 h2 u2 com
## concentrating 0.74 0.54 0.46 1
## hard_working 0.72 0.52 0.48 1
## enjoy 0.79 0.62 0.38 1
## interest 0.78 0.61 0.39 1
## important 0.73 0.54 0.46 1
## future_goals 0.66 0.43 0.57 1
##
## PA1
## SS loadings 3.26
## Proportion Var 0.54
##
## Mean item complexity = 1
## Test of the hypothesis that 1 factor is sufficient.
##
## The degrees of freedom for the null model are 15 and the objective function was 2.81 with Chi Square of 8302.02
## The degrees of freedom for the model are 9 and the objective function was 0.17
##
## The root mean square of the residuals (RMSR) is 0.05
## The df corrected root mean square of the residuals is 0.07
##
## The harmonic number of observations is 2956 with the empirical chi square 255.92 with prob < 5.6e-50
## The total number of observations was 2956 with Likelihood Chi Square = 510.05 with prob < 4.1e-104
##
## Tucker Lewis Index of factoring reliability = 0.899
## RMSEA index = 0.137 and the 90 % confidence intervals are 0.127 0.148
## BIC = 438.12
## Fit based upon off diagonal values = 0.99
## Measures of factor score adequacy
## PA1
## Correlation of (regression) scores with factors 0.94
## Multiple R square of scores with factors 0.88
## Minimum correlation of possible factor scores 0.76
fit1 <- fa(ssc, nfactors=2, fm = "pa")
fit1
## Factor Analysis using method = pa
## Call: fa(r = ssc, nfactors = 2, fm = "pa")
## Standardized loadings (pattern matrix) based upon correlation matrix
## PA1 PA2 h2 u2 com
## concentrating 0.87 -0.10 0.64 0.36 1.0
## hard_working 0.74 0.02 0.56 0.44 1.0
## enjoy 0.67 0.15 0.62 0.38 1.1
## interest 0.57 0.25 0.60 0.40 1.4
## important 0.14 0.69 0.64 0.36 1.1
## future_goals -0.04 0.81 0.61 0.39 1.0
##
## PA1 PA2
## SS loadings 2.26 1.40
## Proportion Var 0.38 0.23
## Cumulative Var 0.38 0.61
## Proportion Explained 0.62 0.38
## Cumulative Proportion 0.62 1.00
##
## With factor correlations of
## PA1 PA2
## PA1 1.00 0.74
## PA2 0.74 1.00
##
## Mean item complexity = 1.1
## Test of the hypothesis that 2 factors are sufficient.
##
## The degrees of freedom for the null model are 15 and the objective function was 2.81 with Chi Square of 8302.02
## The degrees of freedom for the model are 4 and the objective function was 0.04
##
## The root mean square of the residuals (RMSR) is 0.02
## The df corrected root mean square of the residuals is 0.04
##
## The harmonic number of observations is 2956 with the empirical chi square 34.14 with prob < 7e-07
## The total number of observations was 2956 with Likelihood Chi Square = 104.57 with prob < 1e-21
##
## Tucker Lewis Index of factoring reliability = 0.954
## RMSEA index = 0.092 and the 90 % confidence intervals are 0.077 0.108
## BIC = 72.6
## Fit based upon off diagonal values = 1
## Measures of factor score adequacy
## PA1 PA2
## Correlation of (regression) scores with factors 0.93 0.90
## Multiple R square of scores with factors 0.86 0.81
## Minimum correlation of possible factor scores 0.73 0.62
psych::alpha(ssc)
##
## Reliability analysis
## Call: psych::alpha(x = ssc)
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd
## 0.88 0.88 0.87 0.54 7.1 0.0036 2.8 0.83
##
## lower alpha upper 95% confidence boundaries
## 0.87 0.88 0.88
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se
## concentrating 0.86 0.86 0.83 0.54 5.9 0.0042
## hard_working 0.86 0.86 0.84 0.55 6.0 0.0042
## enjoy 0.85 0.85 0.83 0.53 5.6 0.0045
## interest 0.85 0.85 0.83 0.53 5.6 0.0045
## important 0.85 0.85 0.83 0.54 5.9 0.0043
## future_goals 0.87 0.87 0.84 0.57 6.5 0.0039
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## concentrating 2956 0.78 0.79 0.73 0.68 2.9 1.0
## hard_working 2956 0.77 0.78 0.72 0.67 2.9 1.0
## enjoy 2956 0.81 0.82 0.77 0.72 2.8 1.1
## interest 2956 0.81 0.81 0.77 0.72 2.9 1.1
## important 2956 0.79 0.79 0.74 0.69 2.7 1.1
## future_goals 2956 0.74 0.74 0.66 0.61 2.5 1.1
##
## Non missing response frequency for each item
## 1 2 3 4 miss
## concentrating 0.12 0.20 0.33 0.35 0
## hard_working 0.14 0.20 0.31 0.35 0
## enjoy 0.15 0.20 0.31 0.33 0
## interest 0.14 0.20 0.30 0.37 0
## important 0.19 0.23 0.29 0.28 0
## future_goals 0.26 0.24 0.26 0.24 0
In terms of takeaways, it looks like we could plausibly add interest to overall engagement (concentrating, hard_working, and enjoy). Future goals has a lower factor loading when we consider a one factor solution; while we could not fit a two-factor solution with only four items, with all six, it appears that future goals and important to you do not load highly on the same factor as the other four, and do load on another factor.
Can we still feel OK about measuring relevance as the mean of importance to you, importance to future and use outside of program? If the alpha for these 3 looks OK that’s enough for me here.
ss2 <- esm %>%
dplyr::select(important, future_goals, use_outside)
psych::alpha(ss2)
##
## Reliability analysis
## Call: psych::alpha(x = ss2)
##
## raw_alpha std.alpha G6(smc) average_r S/N ase mean sd
## 0.84 0.84 0.78 0.63 5.2 0.0051 2.6 0.96
##
## lower alpha upper 95% confidence boundaries
## 0.83 0.84 0.85
##
## Reliability if an item is dropped:
## raw_alpha std.alpha G6(smc) average_r S/N alpha se
## important 0.79 0.79 0.65 0.65 3.8 0.0077
## future_goals 0.77 0.77 0.63 0.63 3.3 0.0085
## use_outside 0.77 0.77 0.62 0.62 3.3 0.0085
##
## Item statistics
## n raw.r std.r r.cor r.drop mean sd
## important 2969 0.86 0.86 0.75 0.69 2.7 1.1
## future_goals 2970 0.88 0.87 0.78 0.71 2.5 1.1
## use_outside 2957 0.88 0.87 0.78 0.71 2.6 1.1
##
## Non missing response frequency for each item
## 1 2 3 4 miss
## important 0.19 0.24 0.29 0.28 0
## future_goals 0.26 0.24 0.26 0.24 0
## use_outside 0.22 0.23 0.27 0.28 0
Let’s try to construct some composite measures on the PQA. We can start with the things Neil suggested in his email, which I have pasted below: I think we should treat the PQA data in the following manner:
I also made codes for the aspects of data modeling, whereby if one of the PQA indicators for the aspects of data modeling was greater than one, than that code would be equal to one. These codes seem to suggest that students are being supported in doing these activities with some regularity. I am slightly concerned that data modeling is being considered too generally.
Here is that coding frame:
pqa <- mutate(pqa,
youth_development_overall = active_part_1 + active_part_2 + ho_thinking_1 + ho_thinking_2 + ho_thinking_3 + belonging_1 + belonging_2 + agency_1 + agency_2 + agency_3 + agency_4,
making_observations = stem_sb_8,
data_modeling = stem_sb_2 + stem_sb_3 + stem_sb_9,
interpreting_communicating = stem_sb_6,
generating_data = stem_sb_4,
asking_questions = stem_sb_1)
ggplot(pqa, aes(x = youth_development_overall)) +
geom_histogram()
pqa %>%
dplyr::select(asking_questions, making_observations, generating_data, data_modeling, interpreting_communicating) %>%
gather(key, val) %>%
mutate(val = ifelse(val >= 1, 1, 0)) %>%
group_by(key) %>%
summarize(mean_code = mean(val)) %>%
arrange(desc(mean_code))
## # A tibble: 5 x 2
## key mean_code
## <chr> <dbl>
## 1 data_modeling 0.5127119
## 2 asking_questions 0.3898305
## 3 interpreting_communicating 0.3771186
## 4 making_observations 0.2584746
## 5 generating_data 0.2245763