We begin by presenting brief descriptive results. Next we report our primary analyses of intervention success. We end with a number of secondary, exploratory analyses.

Descriptive analyses

We had six primary outcome variables, corresponding to our six tasks: three in mathematics (Arithmetic, Place Value, and the standardized WJ III assessment) and three cognitive measures (Matrix Reasoning, Go/No Go, and Spatial Working Memory). The distribution of each variable is shown in Figure 2.

As is evident from the Figure, all measures were higher for second graders than for first graders, and all measures showed positive growth over the course of the school year. Some showed larger changes than others due to features of the tasks themselves. For example, the place value measure was explicitly designed to capture content being learned during these two years of schooling and showed substantial movement. (Its distribution was also idiosynractic because an understanding of two-place place-value would allow a student to complete a particular subset of questions.) In contrast, the Go/No Go and Spatial WM tasks showed smaller changes relative to the amount of individual variation that we saw.

## Warning: Removed 7 rows containing non-finite values (stat_bin).
Figure 2. Histograms showing the distribution of scores from each task in our battery, split by grade level. Dashed lines show means. Upper panels show pre-test scores, lower panels show post-test scores.

Figure 2. Histograms showing the distribution of scores from each task in our battery, split by grade level. Dashed lines show means. Upper panels show pre-test scores, lower panels show post-test scores.

Task Grade r lower 95% CI upper 95% CI p
Arithmetic 1st 0.49 0.26 0.66 0.0001
Arithmetic 2nd 0.51 0.35 0.64 0.0000
Place Value 1st 0.32 0.07 0.53 0.0136
Place Value 2nd 0.64 0.51 0.74 0.0000
WJ III 1st 0.33 0.08 0.54 0.0099
WJ III 2nd 0.38 0.20 0.54 0.0001
Matrix Reasoning 1st 0.36 0.12 0.57 0.0043
Matrix Reasoning 2nd 0.32 0.14 0.49 0.0010
Go/No Go 1st 0.56 0.36 0.72 0.0000
Go/No Go 2nd 0.45 0.28 0.59 0.0000
Spatial WM 1st 0.42 0.18 0.62 0.0010
Spatial WM 2nd 0.31 0.12 0.47 0.0019

All tasks showed some evidence of modest test-retest reliability across the school year (range=0.31 – 0.64), comparable to the reliabilities found in our previous work (Barner et al., 2016). Higher reliability would of course increase our power to see condition effects, but might be difficult to achieve without substantially longer testing sessions. In addition, some correlations may be depressed because of real change over the course of the study. For example, we would not expect place value scores to be highly correlated given that many students learn new place value concepts over the course of the year.

We also examined intervention uptake at the end of the study (Figure 3). We found a roughly bimodial distribution of children, with some children relatively proficient at decoding abacus representations and others quite poor and only able to do so for 1 - 2 digit displays. The relative balance of children in the two modes was different across grades, however, with a much larger population of second-graders gaining proficiency in the technique.

## Warning: Missing column names filled in: 'X26' [26], 'X27' [27],
## 'X28' [28], 'X29' [29], 'X30' [30], 'X31' [31], 'X32' [32], 'X33' [33],
## 'X34' [34], 'X35' [35], 'X36' [36], 'X37' [37], 'X38' [38], 'X39' [39],
## 'X40' [40], 'X41' [41], 'X42' [42], 'X43' [43], 'X44' [44], 'X45' [45],
## 'X46' [46], 'X47' [47], 'X48' [48]

These uptake findings are an important metric of the appropriateness of MA instruction. A relatively small proportion of first graders could accurately decode a multi-digit abacus by the end of a year of instruction (19%). Thus, MA may not have been an appropriate curriculum for these children. We discuss this result in more depth below, but we note that it qualifies the interpretation of all subsequent outcome measures for the intervention.

Primary analyses

The primary question addressed by our confirmatory analyses was whether assignment to treatment condition (MA vs. Control) resulted in differential change in mathematical or cognitive measures. Due to model convergence issues, we deviated from our pre-registered plan by removing random slopes for individual classes (this move follows our standard operating procedure). Table 3 shows all models, with \(p\)-values computed via the \(t=z\) method (Barr et al., 2013). Figures 4 and 5 show scores for mathematics and cognitive tasks, respectively.

## Warning: Unknown levels in `f`: gradeSecondGrade, Mental Abacus

## Warning: Unknown levels in `f`: gradeSecondGrade, Mental Abacus

## Warning: Unknown levels in `f`: gradeSecondGrade, Mental Abacus

## Warning: Unknown levels in `f`: gradeSecondGrade, Mental Abacus

## Warning: Unknown levels in `f`: gradeSecondGrade, Mental Abacus

## Warning: Unknown levels in `f`: gradeSecondGrade, Mental Abacus
Task Predictor Beta Std Err t p
Arithmetic Intercept 0.036 0.014 2.57 0.0103
Arithmetic gradeSecond Grade 0.093 0.015 6.27 0.0000
Arithmetic Post-Test 0.145 0.010 14.23 0.0000
Arithmetic groupMental Abacus 0.009 0.016 0.54 0.5872
Arithmetic Post-Test x Mental Abacus -0.011 0.014 -0.79 0.4322
Place Value Intercept 0.031 0.034 0.92 0.3599
Place Value gradeSecond Grade 0.258 0.035 7.33 0.0000
Place Value Post-Test 0.251 0.027 9.16 0.0000
Place Value groupMental Abacus 0.030 0.039 0.76 0.4476
Place Value Post-Test x Mental Abacus 0.067 0.037 1.80 0.0720
WJ III Intercept 0.224 0.014 16.51 0.0000
WJ III gradeSecond Grade 0.147 0.014 10.66 0.0000
WJ III Post-Test 0.190 0.012 16.26 0.0000
WJ III groupMental Abacus 0.003 0.016 0.20 0.8428
WJ III Post-Test x Mental Abacus 0.000 0.016 -0.01 0.9956
Matrix Reasoning Intercept 0.206 0.024 8.59 0.0000
Matrix Reasoning gradeSecond Grade 0.082 0.025 3.34 0.0009
Matrix Reasoning Post-Test 0.118 0.020 5.94 0.0000
Matrix Reasoning groupMental Abacus -0.005 0.028 -0.17 0.8680
Matrix Reasoning Post-Test x Mental Abacus -0.027 0.027 -1.02 0.3085
Go/No Go Intercept 0.725 0.018 39.26 0.0000
Go/No Go gradeSecond Grade 0.071 0.019 3.62 0.0003
Go/No Go Post-Test 0.043 0.013 3.30 0.0010
Go/No Go groupMental Abacus 0.008 0.021 0.37 0.7100
Go/No Go Post-Test x Mental Abacus -0.037 0.017 -2.15 0.0315
Spatial WM Intercept 0.296 0.018 16.24 0.0000
Spatial WM gradeSecond Grade 0.048 0.018 2.71 0.0068
Spatial WM Post-Test 0.072 0.018 3.93 0.0001
Spatial WM groupMental Abacus 0.024 0.021 1.13 0.2578
Spatial WM Post-Test x Mental Abacus 0.011 0.025 0.44 0.6578
Figure 4. Performance on mathematics measures by time and grade. Error bars show 95% confidence intervals, computed by non-parametric bootstrap.

Figure 4. Performance on mathematics measures by time and grade. Error bars show 95% confidence intervals, computed by non-parametric bootstrap.

Figure 5. Performance on cognitive measures by time and grade. Error bars show 95% confidence intervals, computed by non-parametric bootstrap.

Figure 5. Performance on cognitive measures by time and grade. Error bars show 95% confidence intervals, computed by non-parametric bootstrap.

Beginning with the math measures, we did not see numerical or statistical evidence of differential change in performance for either the in-house arithmetic or standardized WJ-III measures. This result does not replicate the findings of Barner et al. (2016), where differences on these measures emerged numerically after a single year of training. We discuss possible reasons for this disparity below. We did see a numerical trend towards the predicted time by condition interaction for the place-value measure, however. Students in the MA condition tended to make a larger gain in place value scores over the course of the study than those in the control group. This result was marginal in the mixed effects model (\(p = .07\)) so we interpret it with caution.1 Nevertheless, it is consistent with a similar trend in Barner et al. (2016).

In the cognitive measures, we did not see evidence of differential changes in performance for either matrix reasoning or spatial working memory. These results are consistent with our previous findings and suggest again that we were unable to detect MA-related changes to spatial working memory.

We did, however, find an unpredicted negative interaction of time and condition, such that students in the control group appeared to increase more in performance on the Go/No Go task. One possible explanation for this surprising finding would be a speed-accuracy tradeoff such that children in the MA group were less accurate but faster. This explanation appeared to be plausible based on visual inspection of the reaction times (Figure 6). To test this explanation, we performed an exploratory analysis in which we re-ran our planned linear mixed effects model on Go/No Go accuracy scores but this time including a main effect of reaction time, to control for the different average timing of participants’ responses on correct trials. Consistent with the idea of a speed accuracy tradeoff, the magnitude of the time by condition interaction was now reduced by an order of magnitude and was no longer significant (\(\beta = -0.004\), \(p = 0.721\)). Thus, we believe the Go/No Go effect reflects a shift in a speed-accuracy tradeoff rather than a true change in cognitive functioning.

In sum, we saw at best limited evidence for the effectiveness of the MA intervention. In the math tasks, only the place value measure showed a hint of an intervention effect. And in the cognitive tasks, there were no intervention effects except for a possible shift in response criterion on the Go/No Go task.

Secondary analyses

Spatial working memory mediation analysis.

In our previous study, we found that spatial working memory score at study initiation mediated the effects of the intervention. Children who were above the median in spatial working memory tended to show the largest gains in arithmetic performance from studying MA. We replicated this analysis on all three of our math measures (Figure 7). Of the three, only place value showed the predicted pattern, and only for the second graders. Numerically, the pattern for place value was similar to what we observed in the arithmetic measure in our first study: greater growth for high spatial WM abacus users. But in exploratory models, the three-way interaction of spatial working memory, time, and condition was not significant. Likely our study would have required considerably more power to detect such an effect.

Math anxiety

We further assessed whether the MA intervention led to changes in math anxiety at the end of the study. As shown in Figure 8, though first-graders showed overall more math anxiety than second graders, there were only minor numerical differences in math anxiety between groups.


  1. In exploratory \(t\)-tests, we did see a significant post-test difference between intervention groups (\(t\)(167.5) = -2.22, \(p\) = 0.03). This test was not significant for first-graders alone, (\(t\)(63.93) = -0.62, \(p\) = 0.54) but was for second-graders (\(t\)(97.36) = -2.33, \(p\) = 0.02).