This document is structured around:
First, here’s some background on the analysis (we also have a much longer version that we can pull from when we write this up; saved in manuscript/draft-of-paper.Rmd
. This is also in the README.
A strength of intensive data collection methods like the Experience Sampling Method (ESM; Hektner, Schmidt, & Csikszentmihalyi, 2007) is that they enable researchers to examine participants’ immediate experience and model changes in these experiences over time while accounting for multiple dependencies in the data. A current analytic challenge for researchers using this type of data involves examining how participants’ responses repeatedly measured throughout data collection cumulatively relate to some single, longer-term outcome.
Recently, out-of-school-time programs focusing on STEM have proliferated to combat declines in STEM interest during adolescence (National Academy of Engineering and National Research Council, 2014). Though many have argued that contexts for learning outside of school have an important role to play in youths’ development of interest (Hidi, Renninger, & Krapp, 2016), relatively little is known about whether and how youths’ interest develops in such contexts. Contemporary motivational theory suggests that interests emerge from the interactions of an individual in a particular environment, rather than residing entirely within the individual (Hidi et al., 2016). Thus it is essential to understand the ways that individuals engage with STEM-focused environments to know how STEM-related interests may emerge.
The purpose of this study was to explore the utility of a particular analytic method for testing the effects of sustained engagement in summer STEM programs on the development of youth’s interest over time. The participants for the study were 203 racially and ethnically diverse youth in the Northeast United States. To determine how youths’ in-the-moment engagement related to their individual interest in STEM, a multivariate model estimated with tools familiar to Bayesian methods (namely, Markov Chain Monte Carlo [MCMC]). The model was estimated using the MCMCglmm R package (Hadfield, 2010) and includes both youths’ in-the-moment engagement (measured via ESM) and their post-program interest in a single, multivariate model. This approach contrasts with a multilevel modeling approach, for which two separate models are specified, which can contribute to overconfident inferences about effects (Houslay & Wilson, 2017). Also, a feature of MCMC is that its use allows for the recognition of complex data structures, which can be challenging to do when using a latent variable modeling approach.
The analysis showed that youths’ in-the-moment engagement was a significant, positive predictor (effect size r = .27) of youths’ post-program interest in STEM, accounting for their initial interest in STEM, their gender, and the nesting and cross-classification of youths’ responses. This effect is more conservative than that found as part of analyses carried out with separate models (r = .34; Rosenberg, Beymer, & Schmidt, 2018). Future work can use an extension of the methodological approach to examine both the (between-youth) effects of engagement upon post-program interest and (within-youth) relations between engagement and its rate of change and the effects of both upon interest development. This study demonstrates how embracing MCMC and Bayesian methods may be a natural fit for analyses when a goal is to embrace and study multivariate motivational processes in real-world settings that lead to novel insights into learning and development.
This section includes the results of an analysis . . .
The first section uses lme4. The second uses MCMCglmm.
library(tidyverse)
library(MCMCglmm)
library(lme4)
# install.packages("r2glmm")
library(r2glmm) # this is used, for now
# install.packages("konfound") # not used now
# library(konfound)
# install.packages("here")
library(here)
## here() starts at /Users/joshuarosenberg/Documents/modeling-changes-in-interest
d_red <- read_csv("processed-data/data-to-model.csv")
## Parsed with column specification:
## cols(
## participant_ID = col_double(),
## program_ID = col_double(),
## rm_engagement = col_double(),
## pre_interest = col_double(),
## post_interest = col_double(),
## gender_female = col_double()
## )
Note/question for Tom: For the first line in the next chunk (m1a <- lmer(rm_engagement ~ 1 + pre_interest + gender_female + (1 | participant_ID), data = d_red)
) wondering if it is okay that pre-interest and gender are repeated here (even though they’re measured only once per individual); for post-interest (which is also measured only once per individual), you wrote some code that uses each individual’s first response as their post-interest measure.
Note to consider later: We are filtering the data frame before we model it with lme4 to include only those obs. with a post-interest measure (and a value for the gender var - though we’re only missing gender var for two participants). Is this necessary/important? I suspect so, but just flagging.
m1a <- lmer(rm_engagement ~ 1 + pre_interest + gender_female + (1 | participant_ID), data = d_red)
d_BLUP <- ranef(m1a) %>%
pluck(1) %>%
rownames_to_column("participant_ID") %>%
mutate(participant_ID = as.integer(as.character((participant_ID)))) %>%
rename(rm_engagement_BLUP = `(Intercept)`) %>%
as_tibble()
d_ind_level_2 <- distinct(d_red, participant_ID, program_ID, .keep_all = TRUE)
d_for_m1b <- left_join(d_ind_level_2, d_BLUP, by = "participant_ID") # 203 obs
d_for_m1b_filtered <- filter(d_for_m1b, !is.na(post_interest) & !is.na(gender_female)) # only 141
m1b <- lm(post_interest ~ 1 + rm_engagement_BLUP + gender_female + pre_interest, data = d_for_m1b)
# konfound(m1b, rm_engagement_BLUP)
# konfound(m1b, pre_interest)
summary(m1b)
# konfound::konfound(m1b)
partial_corr <- r2glmm::r2beta(m1b) %>%
filter(Effect == "rm_engagement_BLUP") %>%
select(Rsq) %>%
sqrt() %>%
round(3)
##
## Call:
## lm(formula = post_interest ~ 1 + rm_engagement_BLUP + gender_female +
## pre_interest, data = d_for_m1b)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.97365 -0.35127 0.03355 0.42297 1.79135
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.24076 0.23397 5.303 4.45e-07 ***
## rm_engagement_BLUP 0.46171 0.10915 4.230 4.25e-05 ***
## gender_female -0.14041 0.11879 -1.182 0.239
## pre_interest 0.62788 0.06884 9.121 8.37e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.6976 on 137 degrees of freedom
## (62 observations deleted due to missingness)
## Multiple R-squared: 0.4309, Adjusted R-squared: 0.4184
## F-statistic: 34.58 on 3 and 137 DF, p-value: < 2.2e-16
Note to Tom: Using results of r2glmm::r2beta()
to calculate correlation between RM engagement (BLUP) and post-interestinterest; take square root of that value.
\({ r }_{ BLUP-post-interest }\) = 0.34, accounting for their initial interest in STEM, their gender, and the nesting and cross-classification of youths’ responses.
RM engagement (w/ BLUP) seems important; pre-interest seems important; gender female not quite important.
d_for_m2 <- filter(d_red, !is.na(gender_female) & !is.na(pre_interest)) # # if there are missing vals in the fixed predictors, MCMCglmm gives a warning
prior = list(R = list(V = diag(c(1, 0.0001), 2, 2), nu = 0.002, fix = 2),
G = list(G1 = list(V = diag(2), nu = 2,
alpha.mu = rep(0, 2),
alpha.V = diag(25^2, 2, 2))))
m2 <- MCMCglmm(fixed = cbind(rm_engagement, post_interest) ~ trait - 1 +
trait:gender_female +
trait:pre_interest,
random =~ us(trait):participant_ID,
rcov = ~ idh(trait):units,
family = rep("gaussian",2),
data = as.data.frame(d_for_m2),
prior = prior,
burnin = 3000,
nitt = 43000,
thin = 40,
verbose = TRUE)
## Warning: 'cBind' is deprecated.
## Since R version 3.2.0, base's cbind() should work fine with S4 objects
##
## MCMC iteration = 0
##
## MCMC iteration = 1000
##
## MCMC iteration = 2000
##
## MCMC iteration = 3000
##
## MCMC iteration = 4000
##
## MCMC iteration = 5000
##
## MCMC iteration = 6000
##
## MCMC iteration = 7000
##
## MCMC iteration = 8000
##
## MCMC iteration = 9000
##
## MCMC iteration = 10000
##
## MCMC iteration = 11000
##
## MCMC iteration = 12000
##
## MCMC iteration = 13000
##
## MCMC iteration = 14000
##
## MCMC iteration = 15000
##
## MCMC iteration = 16000
##
## MCMC iteration = 17000
##
## MCMC iteration = 18000
##
## MCMC iteration = 19000
##
## MCMC iteration = 20000
##
## MCMC iteration = 21000
##
## MCMC iteration = 22000
##
## MCMC iteration = 23000
##
## MCMC iteration = 24000
##
## MCMC iteration = 25000
##
## MCMC iteration = 26000
##
## MCMC iteration = 27000
##
## MCMC iteration = 28000
##
## MCMC iteration = 29000
##
## MCMC iteration = 30000
##
## MCMC iteration = 31000
##
## MCMC iteration = 32000
##
## MCMC iteration = 33000
##
## MCMC iteration = 34000
##
## MCMC iteration = 35000
##
## MCMC iteration = 36000
##
## MCMC iteration = 37000
##
## MCMC iteration = 38000
##
## MCMC iteration = 39000
##
## MCMC iteration = 40000
##
## MCMC iteration = 41000
##
## MCMC iteration = 42000
##
## MCMC iteration = 43000
summary(m2)
plot(m2)
m2_cor <- m2$VCV[,"traitpost_interest:traitrm_engagement.participant_ID"]/
(sqrt(m2$VCV[,"traitpost_interest:traitpost_interest.participant_ID"])*
sqrt(m2$VCV[,"traitrm_engagement:traitrm_engagement.participant_ID"]))
posterior.mode(m2_cor)
plot(m2_cor)
HPDinterval(m2_cor)
##
## Iterations = 3001:42961
## Thinning interval = 40
## Sample size = 1000
##
## DIC: 9872.907
##
## G-structure: ~us(trait):participant_ID
##
## post.mean l-95% CI
## traitrm_engagement:traitrm_engagement.participant_ID 0.3424 0.26229
## traitpost_interest:traitrm_engagement.participant_ID 0.1282 0.05124
## traitrm_engagement:traitpost_interest.participant_ID 0.1282 0.05124
## traitpost_interest:traitpost_interest.participant_ID 0.6539 0.52609
## u-95% CI eff.samp
## traitrm_engagement:traitrm_engagement.participant_ID 0.4171 1000.0
## traitpost_interest:traitrm_engagement.participant_ID 0.2095 102.3
## traitrm_engagement:traitpost_interest.participant_ID 0.2095 102.3
## traitpost_interest:traitpost_interest.participant_ID 0.8094 800.7
##
## R-structure: ~idh(trait):units
##
## post.mean l-95% CI u-95% CI eff.samp
## traitrm_engagement.units 0.4091 0.3861 0.4296 1205
## traitpost_interest.units 0.0001 0.0001 0.0001 0
##
## Location effects: cbind(rm_engagement, post_interest) ~ trait - 1 + trait:gender_female + trait:pre_interest
##
## post.mean l-95% CI u-95% CI eff.samp
## traitrm_engagement 2.55754 2.20916 2.87375 1000.00
## traitpost_interest 1.72452 1.21548 2.21072 23.01
## traitrm_engagement:gender_female -0.06719 -0.25639 0.10186 1097.71
## traitpost_interest:gender_female -0.18472 -0.44037 0.05157 217.85
## traitrm_engagement:pre_interest 0.10641 0.01278 0.20522 1000.00
## traitpost_interest:pre_interest 0.48458 0.35264 0.65731 26.07
## pMCMC
## traitrm_engagement <0.001 ***
## traitpost_interest <0.001 ***
## traitrm_engagement:gender_female 0.456
## traitpost_interest:gender_female 0.142
## traitrm_engagement:pre_interest 0.028 *
## traitpost_interest:pre_interest <0.001 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## var1
## 0.2952195
## lower upper
## var1 0.1055799 0.406185
## attr(,"Probability")
## [1] 0.95
RM engagement seems important; pre-interest seems important; gender female not quite important. Effect of RM engagement is smaller than estimated with BLUP. From proposal, found that ${ r }_{ rm-engagement-post-interest } = .27, smaller than .34 found with BLUP.
program_ID
var)