1 Introduction

This document reports code for my Technical Comment on Berkowitz et al. (2015). For a fuller analysis see this repository.

d <- read.csv("aac7427-Accessory-Data-File-S1.csv") %>%
  mutate(condition = ifelse(cond.dum == 1, "Math App","Reading App"))

2 Exclusions and Data Preparation

We need to exclude:

We also see that we need to include 34 children due to experimenter errors (24 fall, 16 math and 8 reading; 10 in spring, 9 math and 1 reading). This is in the filter variables (e.g., wjappliedfa13_filter). These are binary variables that presumably could be conveying whether the children’s scores should be included because they get basal and ceiling items. We do this after excluding twins and dropouts.

sum(d$wjappliedfa13_filter[d$twin == 0 & d$year1dropout == 0] == 0)
## [1] 24
sum(d$WJ.applied.Sp14.filter[d$twin == 0 & d$year1dropout == 0] == 0)
## [1] 11

These numbers are very close to the paper’s numbers, though we do get 11 in the spring (not 10). Let’s do the same for reading, where there were 25 exclusions (7 in fall, 4 math and 3 reading; 18 in spring, 12 math and 6 reading).

sum(d$WJletterwordFa13_filter[d$twin == 0 & d$year1dropout == 0] == 0)
## [1] 7
sum(d$WJ.letterword.Sp14.filter[d$twin == 0 & d$year1dropout == 0] == 0)
## [1] 19

Again we’re one off (18 vs. 19 in spring), but quite close. Note there is a separate issue in exclusions for analyses that use math anxiety: where math anxiety questionnaire data is missing (~20% of cases), you have to exclude data. I’m assuming the authors just dropped non-responding families for these analyses.

dt.excl <- d %>% 
  mutate(applied.w.fa13 = ifelse(wjappliedfa13_filter,
                                          WJ.Applied.Problems.W.Score.Fa13, 
                                          NA),
         applied.w.sp14 = ifelse(WJ.applied.Sp14.filter,
                                          WJ.Applied.Problems_W.Score_Sp14, 
                                          NA),
         reading.w.fa13 = ifelse(WJletterwordFa13_filter,
                                          WJ.Letter.Word.ID.W.Score.Fa13, 
                                          NA),
         reading.w.sp14 = ifelse(WJ.letterword.Sp14.filter,
                                          WJ.LetterWord.Wscore.Sp14, 
                                          NA)) %>%
  filter(!twin, !year1dropout) %>%
  gather(measure, score, 
         applied.w.fa13, applied.w.sp14, 
         reading.w.fa13, reading.w.sp14) %>%
  separate(measure, c("measure","time"), sep="w\\.") %>%
  mutate(year = as.factor(str_sub(time, 4, 5)),
         measure = ifelse(grepl( "applied", measure), "Math", "Reading")) 

Let’s detour briefly to check the means now that we are doing exclusions. Table S1 of the original paper gives means, albeit using a median split on math anxiety (e.g. not grand means).

Low anxious:

High anxious: 

dt.excl %>%
  filter(measure == "Math") %>%
  filter(!is.na(parentMA_mediansplit)) %>%
  group_by(parentMA_mediansplit, condition, year) %>%
  summarise(mean = mean(score, na.rm=TRUE), 
            sd = sd(score, na.rm=TRUE))
## Source: local data frame [8 x 5]
## Groups: parentMA_mediansplit, condition [?]
## 
##   parentMA_mediansplit   condition   year     mean       sd
##                  (dbl)       (chr) (fctr)    (dbl)    (dbl)
## 1                    1    Math App      3 462.7468 18.71507
## 2                    1    Math App      4 478.4348 21.11036
## 3                    1 Reading App      3 458.8571 18.08212
## 4                    1 Reading App      4 475.3134 19.25602
## 5                    2    Math App      3 459.0562 15.98633
## 6                    2    Math App      4 474.3836 18.67978
## 7                    2 Reading App      3 457.4800 13.56352
## 8                    2 Reading App      4 469.2157 18.68402

We get only minor numerical differences between the data I’m using and the data reported by the authors (e.g., <= 1 W score unit).

Difference scores.

diffs.excl <- dt.excl %>%
  select(ChildID, condition, year, measure, score) %>%
  mutate(measure_year = interaction(measure, year)) %>%
  select(-year, -measure) %>%
  spread(measure_year, score) %>%
  mutate(math_gain = Math.4 - Math.3,
         reading_gain = Reading.4 - Reading.3) %>%
  select(-Math.3, -Math.4, -Reading.3, -Reading.4) %>%
  gather(measure, score, math_gain, reading_gain)

Do the whole thing with grade equivalents.

dt.excl.ge <- d %>% 
  mutate(applied.w.fa13 = ifelse(wjappliedfa13_filter,
                                          WJ.Applied.Problems.GE.Fa13, 
                                          NA),
         applied.w.sp14 = ifelse(WJ.applied.Sp14.filter,
                                          WJ.Applied.Problem_GE_Sp14, 
                                          NA),
         reading.w.fa13 = ifelse(WJletterwordFa13_filter,
                                          WJ.Letter.Word.ID.GE.Fa13, 
                                          NA),
         reading.w.sp14 = ifelse(WJ.letterword.Sp14.filter,
                                          WJ.LetterWord_GE_Sp14, 
                                          NA)) %>%
  filter(!twin, !year1dropout) %>%
  gather(measure, score, 
         applied.w.fa13, applied.w.sp14, 
         reading.w.fa13, reading.w.sp14) %>%
  separate(measure, c("measure","time"), sep="w\\.") %>%
  mutate(year = as.factor(str_sub(time, 4, 5)),
         measure = ifelse(grepl( "applied", measure), "Math", "Reading")) 

diffs.excl.ge <- dt.excl.ge %>%
  select(ChildID, condition, year, measure, score) %>%
  mutate(measure_year = interaction(measure, year)) %>%
  select(-year, -measure) %>%
  spread(measure_year, score) %>%
  mutate(math_gain = Math.4 - Math.3,
         reading_gain = Reading.4 - Reading.3) %>%
  select(-Math.3, -Math.4, -Reading.3, -Reading.4) %>%
  gather(measure, score, math_gain, reading_gain) 

And join back in other measures.

diffs.full <- left_join(diffs.excl, d %>% 
                          select(ChildID, ParentMAaverage.Fa13,
                                 parentMA_mediansplit, 
                                 avg.use, use.012.groups)) %>%
  left_join(diffs.excl.ge %>% rename(score.ge = score))

And compute means for plotting.

ms.diffs.excl <- diffs.excl %>%
  group_by(condition, measure) %>%
  multi_boot_standard(column = "score", na.rm=TRUE)

ms.diffs.excl.ge <- diffs.excl.ge %>%
  group_by(condition, measure) %>%
  multi_boot_standard(column = "score", na.rm=TRUE) %>%
  ungroup %>%
  mutate(measure = factor(measure, 
                            levels = c("math_gain","reading_gain"), 
                            labels = c("WJ Applied Problems\n(Math)", 
                                       "WJ Letter-Word\n(Reading)")), 
         Condition = condition)

3 Technical Comment Analyses

This section will now reproduce the analyses in the Technical Comment.

Figure 1.

ggplot(ms.diffs.excl.ge, aes(x = measure, y = mean, 
               ymin = ci_lower, ymax = ci_upper, 
               fill = Condition)) +
  geom_bar(stat="identity", position = "dodge") +
  geom_linerange(position = position_dodge(width = .9)) + 
  scale_fill_solarized() + 
  ylab("Improvement (Grade Equivalents)") + 
  xlab("Measure") + ylim(c(0,1)) 

\(t\)-tests for Figure 1.

kable(tidy(with(diffs.excl.ge, 
                t.test(score[condition == "Math App" & 
                               measure == "math_gain"],
                       score[condition == "Reading App" & 
                               measure == "math_gain"], var.equal = TRUE))))
estimate1 estimate2 statistic p.value parameter conf.low conf.high
0.852973 0.77 1.061383 0.2890304 498 -0.0706193 0.2365653
kable(tidy(with(diffs.excl.ge, 
                t.test(score[condition == "Math App" & 
                               measure == "reading_gain"],
                       score[condition == "Reading App" & 
                               measure == "reading_gain"], var.equal = TRUE))))
estimate1 estimate2 statistic p.value parameter conf.low conf.high
0.8337731 0.8192308 0.2728889 0.7850497 507 -0.0901546 0.1192392

Main longitudinal models (for paragraph beginning “first”). GE scores and then W scores.

kable(summary(lmer(score ~ condition * time 
                   + (1 | ChildID)
                   + (time | TeachID.Year1),
     data = filter(dt.excl.ge, 
                   measure == "Math")))$coefficients, digits = 3)
Estimate Std. Error t value
(Intercept) 1.985 0.088 22.656
conditionReading App -0.075 0.169 -0.444
timesp14 0.842 0.051 16.650
conditionReading App:timesp14 -0.072 0.098 -0.733
kable(summary(lmer(score ~ condition * time 
                   + (1 | ChildID)
                   + (time | TeachID.Year1), 
     data = filter(dt.excl, 
                   measure == "Math")))$coefficients, digits = 3)
Estimate Std. Error t value
(Intercept) 458.914 1.735 264.453
conditionReading App -1.518 3.356 -0.452
timesp14 15.368 0.897 17.141
conditionReading App:timesp14 -1.003 1.742 -0.576

Three-way interaction models (for paragraph beginning “second”). GE and then W scores, as above.

kable(summary(lmer(score ~ condition * time * avg.use 
                   + (1 | ChildID)
                   + (time | TeachID.Year1), 
     data = filter(dt.excl.ge, 
                   measure == "Math")))$coefficients, digits = 3)
Estimate Std. Error t value
(Intercept) 1.770 0.103 17.184
conditionReading App 0.068 0.199 0.340
timesp14 0.650 0.070 9.324
avg.use 0.174 0.046 3.770
conditionReading App:timesp14 0.088 0.134 0.658
conditionReading App:avg.use -0.128 0.073 -1.756
timesp14:avg.use 0.157 0.040 3.968
conditionReading App:timesp14:avg.use -0.137 0.063 -2.190
kable(summary(lmer(score ~ condition * time * avg.use 
                   + (1 | ChildID)
                   + (time | TeachID.Year1), 
     data = filter(dt.excl, 
                   measure == "Math")))$coefficients, digits = 3)
Estimate Std. Error t value
(Intercept) 454.559 2.002 227.015
conditionReading App 0.964 3.875 0.249
timesp14 12.897 1.237 10.429
avg.use 3.546 0.858 4.132
conditionReading App:timesp14 1.226 2.379 0.515
conditionReading App:avg.use -2.381 1.361 -1.749
timesp14:avg.use 2.021 0.703 2.876
conditionReading App:timesp14:avg.use -1.867 1.109 -1.684

Figure 2.

ggplot(filter(diffs.full, 
              !is.na(parentMA_mediansplit), 
              condition == "Math App"), 
              aes(x = avg.use, y = score.ge, 
                  col = factor(parentMA_mediansplit))) + 
  geom_point() + 
  geom_smooth(se=TRUE, method = "lm") + 
  scale_colour_solarized() + 
  xlab("Average Weekly App Usage") + 
  ylab("Improvement (Grade Equivalents)")

Math anxiety three- and four-way interactions (for paragraph beginning “third”).

kable(summary(lmer(score ~ time * condition * ParentMAaverage.Fa13 
                   + (1 | ChildID)
                   + (time | TeachID.Year1),
     data = filter(dt.excl.ge, 
                   measure == "Math")))$coefficients, digits = 3)
Estimate Std. Error t value
(Intercept) 2.281 0.163 14.010
timesp14 1.036 0.126 8.201
conditionReading App -0.038 0.324 -0.117
ParentMAaverage.Fa13 -0.112 0.062 -1.802
timesp14:conditionReading App 0.167 0.249 0.670
timesp14:ParentMAaverage.Fa13 -0.085 0.053 -1.611
conditionReading App:ParentMAaverage.Fa13 -0.020 0.127 -0.157
timesp14:conditionReading App:ParentMAaverage.Fa13 -0.119 0.107 -1.117
kable(summary(lmer(score ~ time * condition * ParentMAaverage.Fa13 
                   + (1 | ChildID)
                   + (time | TeachID.Year1),
     data = filter(dt.excl, 
                   measure == "Math")))$coefficients, digits = 3)
Estimate Std. Error t value
(Intercept) 464.190 3.073 151.051
timesp14 17.718 2.211 8.014
conditionReading App -0.948 6.120 -0.155
ParentMAaverage.Fa13 -1.937 1.152 -1.682
timesp14:conditionReading App 3.740 4.374 0.855
timesp14:ParentMAaverage.Fa13 -1.082 0.915 -1.183
conditionReading App:ParentMAaverage.Fa13 -0.280 2.355 -0.119
timesp14:conditionReading App:ParentMAaverage.Fa13 -2.370 1.860 -1.275
kable(summary(lmer(score ~ time * condition * ParentMAaverage.Fa13 * avg.use + 
               + (1 | ChildID)
               + (time | TeachID.Year1),
     data = filter(dt.excl, 
                   measure == "Math")))$coefficients, digits = 3)
Estimate Std. Error t value
(Intercept) 459.443 4.621 99.432
timesp14 12.277 3.598 3.412
conditionReading App -4.837 8.722 -0.555
ParentMAaverage.Fa13 -1.589 1.832 -0.867
avg.use 3.049 2.876 1.061
timesp14:conditionReading App 10.657 6.722 1.585
timesp14:ParentMAaverage.Fa13 0.000 1.488 0.000
conditionReading App:ParentMAaverage.Fa13 2.647 3.472 0.762
timesp14:avg.use 3.899 2.339 1.667
conditionReading App:avg.use 3.203 4.750 0.674
ParentMAaverage.Fa13:avg.use -0.071 1.252 -0.056
timesp14:conditionReading App:ParentMAaverage.Fa13 -3.990 2.816 -1.417
timesp14:conditionReading App:avg.use -4.899 3.857 -1.270
timesp14:ParentMAaverage.Fa13:avg.use -0.759 1.018 -0.746
conditionReading App:ParentMAaverage.Fa13:avg.use -2.413 2.037 -1.184
timesp14:conditionReading App:ParentMAaverage.Fa13:avg.use 1.137 1.655 0.687
kable(summary(lmer(score ~ time * condition * ParentMAaverage.Fa13 * avg.use + 
               + (1 | ChildID)
               + (time | TeachID.Year1),
     data = filter(dt.excl.ge, 
                   measure == "Math")))$coefficients, digits = 3)
Estimate Std. Error t value
(Intercept) 2.038 0.247 8.249
timesp14 0.596 0.205 2.910
conditionReading App -0.142 0.466 -0.305
ParentMAaverage.Fa13 -0.089 0.099 -0.904
avg.use 0.156 0.155 1.004
timesp14:conditionReading App 0.629 0.382 1.645
timesp14:ParentMAaverage.Fa13 0.013 0.085 0.150
conditionReading App:ParentMAaverage.Fa13 0.091 0.187 0.488
timesp14:avg.use 0.321 0.134 2.405
conditionReading App:avg.use 0.096 0.256 0.374
ParentMAaverage.Fa13:avg.use -0.008 0.068 -0.117
timesp14:conditionReading App:ParentMAaverage.Fa13 -0.226 0.161 -1.410
timesp14:conditionReading App:avg.use -0.333 0.220 -1.513
timesp14:ParentMAaverage.Fa13:avg.use -0.072 0.058 -1.233
conditionReading App:ParentMAaverage.Fa13:avg.use -0.093 0.110 -0.848
timesp14:conditionReading App:ParentMAaverage.Fa13:avg.use 0.077 0.095 0.818

4 Acknowledgements

Thanks very much to the Berkowitz et al. team for posting their raw data and providing feedback on a draft of this reanalysis. Thanks also to Johannes Haushofer as well as the members of the Language and Cognition lab at Stanford for valuable feedback.