Social-emotional learning (SEL) programs have recently gained increasing attention. Therefore, it is vital to evaluate the program’s efficacy in promoting social-emotional and academic skills. Low et al. (2019) utilized a two-year dataset from Second Steps that recorded behavioral reports and observations on over 9,000 students from 61 schools and 7 districts.
The longitudinal data allowed the researchers to investigate the type of growth curve models that could explain the developmental trajectories the best. The study found out that in most measurements, students’ performance only improved during school, not summer. Thus, the “summer slide” phenomenon was evident, and a linear growth model was a poor fit.
The researchers continued to use these models to examine the efficacy of the Second Steps program. Students from the experimental group outperformed those in the control group in emotional symptoms, social-emotional scores, skills for learning, and emotional management. However, the program did not show to improve students’ academic performance, academic engagement, and disruptive behavior, and most effect sizes of the results were relatively small.
The current project aims to reproduce the main results from this paper: 1) find the optimal growth curve model for each measurement; and 2) evaluate the program’s efficacy in SEL and academic measures. If possible, I would also like to test the model fit via other measures, though it is outside the scope of my current statistical knowledge.
The linear growth curve model is widely used to estimate student’s performance over time. If it does not accurately describe the pattern like Low et al. suggested (2019), further longitudinal research on student program outcomes should pursue other models.
Additionally, the paper only used one method to measure the model fit. If more measures are available, the results will be more robust.
First, the original data analysis was conducted via SAS, and changing to R could induce discrepancies in results.
Secondly, I have not tried growth curve models, and multilevel analysis was still a little new to me, so I need help from either the class or the authors.
Thirdly, I have not encountered the implementation used to account for missing data in the paper, so I need to learn more about this method and its implications.
Project repository (on Github): https://github.com/psych251/low2019.git
Original paper (as hosted in your repo): https://github.com/psych251/low2019/blob/cbde9963c81eae601a1597c40d561b6119235435/original_paper/low2019.pdf
(I haven’t received the data, so these are all conjectures)
First, I will review the datasets and the codebook, and remove irrelevant variables.
Then, I will use FIML estimation to to impute missing data. If time allows, I will also reproduce the procedure of finding the variables that correlate with missingness.
Next, I will conduct a round of exploratory analysis, such as finding the number of students involved in each group and average change in scores over time.
The next part is to reproduce the main result – finding the fit for eight growth curve models. I will first specify the parameters of each model, use R packages (maybe lavaan?) to estimate the models, and find the best fit with \(\Delta AICc\) and Akaike weights, as the paper suggested. Potentially I could use other methods to compare the fit of the models, to ensure the robustness of the result.
If time allows, I will also reproduce the main effect of the intervention, using the best fit models from (4).
The authors used SAS to run the analyses. Since I will use R, I expect some discrepencies in results.
The first author mentioned to me that they made nuanced decisions about missing data, imputations and outliers that might not be explicit in the paper.
I expect that these two differences will make it hard to reproduce the exact values in the results. However, if the results are robust enough, I should be able to come to the same conclusions (i.e. what model is the best fit for each DV?), which is my goal of this project.
Identify all relevant variables.
Recalculate the reliability of key outcome measures. (cronbach’s alpha should be identical to the numbers reported in the paper)
Missing data should be adequately imputed. Question: if missingness is systematic (e.g. one teacher didn’t fill out assessments for all of their students), is it ok to impute missing data, or should the student data be dropped? (Either using the author’s method, or finding an alternative; I can’t find a way to measure whether it’s successful)
The models should be constructed correctly with all nested variables specified. (Coefficients should be similar to the paper’s results)
I am still at Step 1. The dataset is unexpectedly huge (1300+ columns!!), the analysis script is unavailable, and the codebook is not clear enough. I have to figure out a lot of things by myself. Below is my progress of summarizing variables of interest.
Year 1: Fall 2012, Spring 2013
Year 2: Fall 2013, Spring 2014
Variables:
Each variable is nested within the later variable.
The condition (Delayed Start vs. Early Start) is manipulated on the school level.
Variable:
36 items, teacher-reported rating scale of student’s social emotional competencies
Range: 0 - 4
Subscales:
Variables:
25 items, teacher-reported scale of student’s behavior
Range: 0 (not true) - 2 (certainly true)
Subscales:
Variables:
Percent intervals of academic engagement time, from direct observation
Range: 0 - 1
Variable:
Percent intervals of disruptive behavior, from direct observation
Range: 0 - 1
Percentage correct on M-CBM math assessment
no granular data
Variable:
Oral Reading Fluency Words Reading Correct Per Minute
“Because reading instruction focused on connected text generally does not begin until first grade, we expected different growth patterns for R-CBM among students who began the study in kindergarten than those who began the study in Grades 1 or 2.”
Range: (Spring 2013) 0 - 229
Variable:
Questions I have now:
Some students were taught by different teachers in Year 1 and Year 2. What should I do??
Sometimes, students’ grade didn’t increment by 1 as they went from Year 1 to Year 2.
Many variables that will be interesting to look at have a lot of missing values (about 1/3). Is it not a good idea to use them?
How should the dataframe be structured? I assume that each student should have four rows, one for each time point. Each row contains public variables like studentID, gender, etc, and student’s outcomes for each time point.
First, I encountered problems when checking the item reliabilities using the psych package. Mike suggested the agreement package, but there isn’t enough explanation about this package for me to implement. For example, I’m not sure which model to use in the dim_icc function. Also, this function seems to calculate inter-rater reliability, and I’m not sure if it can be translated directly to a longitudinal dataset.
Then, I tried to work on fitting the models. I found out the package lavaan has an argument about how to deal with missing data, and full information maximum likelihood estimation is one of the methods. However, I still have the following questions:
I can’t find the way to specify auxillary variables for FIML. Though the author mentioned that there isn’t a lot of difference in adding the auxillary variables vs. not, I am still curious if there’s a way I could implement it in R.
FIML is embedded in a modeling function in lavaan. For example, I used the growth function below, but I don’t know if that’s the correct function I should use. The best way is to retrieve the modified dataset after FIML estimation, but to my knowledge lavaan can’t do this.
I fit a model below, but I need more help interpreting what each coefficient and variable mean. I’d also like to know how to manipulate this model to achieve diferent piecewise models described in the paper. I also need help understanding the output.
Finally, I’ve found a package that can calculate AICc for me (MuMIn), and it works with a lavaan object.
Data preparation following the analysis plan.
library(here)
## Warning: package 'here' was built under R version 4.0.4
## here() starts at C:/Users/iris_/OneDrive - Stanford/Stanford/Autumn 2021/PSYCH 251/low2019
data <- read_csv(here("data", "data_reliability.csv")) %>%
mutate(StudentID = as.character(StudentID))
##
## -- Column specification --------------------------------------------------------
## cols(
## .default = col_double(),
## Condition = col_character(),
## Site = col_character(),
## District = col_character(),
## StudentGender = col_character(),
## StudentRaceS = col_character(),
## StudentGrade = col_character(),
## DESSALowHalf = col_character(),
## DESSALowHalfY2 = col_character(),
## DESSAsecSignif = col_character(),
## DESSAslSignif = col_character(),
## DESSAsaeSignif = col_character(),
## DESSAemSignif = col_character(),
## DESSApsSignif = col_character(),
## SDQesSignif = col_character(),
## SDQcpSignif = col_character(),
## SDQhsSignif = col_character(),
## SDQppSignif = col_character(),
## SDQpsSignif = col_character(),
## ODBSignif = col_character()
## )
## i Use `spec()` for the full column specifications.
dessa_wide <- data %>%
dplyr::select(StudentID, SchoolID, Condition, ADESSA1:ADESSA36, BDESSA1:BDESSA36,
CDESSA1:CDESSA36, DDESSA1:DDESSA36)
dessa <- dessa_wide %>%
pivot_longer(cols = -c(StudentID, SchoolID, Condition), names_pattern = "^(.)(.*)$",
names_to = c("time", "item")) %>%
pivot_wider(id_cols = c(StudentID, SchoolID, Condition, time), names_from = item,
values_from = value, names_repair = "check_unique")
dessa_skill <- dessa %>%
dplyr::select(StudentID, SchoolID, Condition, time, DESSA1, DESSA6, DESSA7,
DESSA9, DESSA12, DESSA14, DESSA16, DESSA20, DESSA23,
row.names = NULL) %>%
mutate(time = as.factor(case_when(time == "A" ~ 1,
time == "B" ~ 2,
time == "C" ~ 3,
time == "D" ~ 4)))
dessa_skill_avg <- dessa_skill %>%
pivot_longer(cols = -c(StudentID, SchoolID, Condition, time),
names_to = c("item")) %>%
group_by(StudentID, SchoolID, Condition, time) %>%
summarize(dessa_skill_score = mean(value, na.rm = T))
## `summarise()` regrouping output by 'StudentID', 'SchoolID', 'Condition' (override with `.groups` argument)
dessa_skill_avg_wide <- dessa_skill_avg %>%
pivot_wider(names_from = time, names_prefix = "t",
values_from = dessa_skill_score)
Plotting and exploration
ggplot(dessa_skill_avg,
aes(x = time, y = dessa_skill_score, col = Condition)) +
geom_line(aes(group = StudentID),
alpha = .05, width = .25) +
stat_summary(aes(group = Condition), geom = "line") +
theme_bw()
## Warning: Ignoring unknown parameters: width
## Warning: Removed 12660 rows containing non-finite values (stat_summary).
## No summary function supplied, defaulting to `mean_se()`
## Warning: Removed 12120 row(s) containing missing values (geom_path).
# geom_smooth(aes(group = Condition), method = "lm")
Mike prototype in lmer
For students within intervention schools, we coded time as (A) 0, 1, 2, 3 across time for linear slopes; (B) 0, 1, 1, 2 for within-year, stepped growth; (C) 0, 1, 1, 1 for initial gains with stable scores thereafter; and (D) 0, 1, 0, 1 for gains during the school years and a decline between years.
dessa_skill_avg$time_A <- as.numeric(dessa_skill_avg$time)-1
dessa_skill_avg$time_B <- dessa_skill_avg$time_A
dessa_skill_avg$time_B[dessa_skill_avg$time_A > 1] <-
dessa_skill_avg$time_A[dessa_skill_avg$time_A > 1] - 1
dessa_skill_avg$time_C <- dessa_skill_avg$time_B
dessa_skill_avg$time_C[dessa_skill_avg$time_B > 1] <- 1
dessa_skill_avg$time_D <- dessa_skill_avg$time_A
dessa_skill_avg$time_D[dessa_skill_avg$time_A > 1] <-
dessa_skill_avg$time_D[dessa_skill_avg$time_A > 1] - 2
mod_A <- lmer(dessa_skill_score ~ time_A +
(time_A | SchoolID) + (1 | StudentID),
data = dessa_skill_avg)
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.00495663 (tol = 0.002, component 1)
mod_B <- lmer(dessa_skill_score ~ time_B +
(time_B | SchoolID) + (1 | StudentID),
data = dessa_skill_avg)
mod_C <- lmer(dessa_skill_score ~ time_C +
(time_C | SchoolID) + (1 | StudentID),
data = dessa_skill_avg)
mod_D <- lmer(dessa_skill_score ~ time_D +
(time_D | SchoolID) + (1 | StudentID),
data = dessa_skill_avg)
AIC(mod_A)
## [1] 47812.27
Functionalize!
run_lmer <- function(time_var) {
mod_formula <- paste("dessa_skill_score ~",time_var,"+
(", time_var, "| SchoolID) + (1 | StudentID)")
mod <- lmer(mod_formula,
data = dessa_skill_avg)
return(mod)
}
sim <- tibble(time_vars = c("time_A","time_B","time_C","time_D")) %>%
mutate(mod = map(time_vars, run_lmer)) %>%
mutate(aic = map(mod, AIC)) %>%
select(-mod) %>%
unnest(cols = c(aic))
## Warning: Problem with `mutate()` input `mod`.
## i Model failed to converge with max|grad| = 0.00495663 (tol = 0.002, component 1)
## i Input `mod` is `map(time_vars, run_lmer)`.
## Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
## Model failed to converge with max|grad| = 0.00495663 (tol = 0.002, component 1)
Try https://cran.r-project.org/web/packages/AICcmodavg/AICcmodavg.pdf for AICc?
Iris’s lavaan version
After meeting with Mike, I was able to use lmer4 to fit the linear mix-effect models in the paper. (See results in data/main_analysis.rmd). Since lmer4 has a built-in argument to turn on maximum likelihood, the problem of missing data was resolved. Becuase there were too many outcome variables, map was used to iterate the model fitting process. I’ve now got results that are close to what I need, but here are a few things I have to do:
I’m still using AIC instead of AICc, and it’s now time to switch to AICc.
ORFPM (reading measure) is split by grade (Kindergarten vs. K-12) in the paper, and I need to recalculate AICc for this measure after a group_by.
(will need help) The current models didn’t put condition as an IV into the formula. I think it’ll be an easy thing to do for Model Set 2 in the paper, where both conditions adopt the same assumptions about the curve pattern. I’m unsure how to estimate Model Set 1, where the control condition is always fit by a linear model.
(probably need help) There is one more model that hasn’t been created yet, which assumes a quadratic relationship for both conditions. I’m guessing that it can be achieved by manipulating the time variables again to make it 1^2, 2^2, 3^2 and 4^2? But I need some confirmation here.
The analyses as specified in the analysis plan.
Side-by-side graph with original graph is ideal here
###Exploratory analyses
Any follow-up analyses desired (not required).
Open the discussion section with a paragraph summarizing the primary result from the key analysis and assess whether you successfully reproduced it, partially reproduced it, or failed to reproduce it.
Add open-ended commentary (if any) reflecting (a) insights from follow-up exploratory analysis of the dataset, (b) assessment of the meaning of the successful or unsuccessful reproducibility attempt - e.g., for a failure to reproduce the original findings, are the differences between original and present analyses ones that definitely, plausibly, or are unlikely to have been moderators of the result, and (c) discussion of any objections or challenges raised by the current and original authors about the reproducibility attempt (if you contacted them). None of these need to be long.