Abstract
We evaluate the effect of Project SPARK, an effort to introduce aerobic exercise to Staten Island’s K-12 PE classes, on academic outcomes. We find mild evidence that it may have had a positive impact, and recommend further steps.The Staten Island Borough President’s office has conducted a preliminary investigation into the effects of aerobic exercise on academic achievement, in consultation with Dr. John Ratey. This program was inspired by and named for his book “SPARK”, an investigation of the benefits of exercise. This was a minimal pilot program intended as a proof of concept, yet it may have had a small positive effect.
Dr. Ratey helped Naperville High implement a small scale study of the effects of exercise on at-risk children and found positive results, although the study was small and not randomized.
One other promising piece of evidence comes from a small randomized trial in Augusta, Georgia. It found that obese young children randomly assigned to different levels of intense exercise did better on standardized tests and assessments of executive functioning, and that their gains scaled with the duration of the exercise.
Our study is not an ideal RCT but it targets a different group of students than the Georgia study, which helps us learn more about how effects might generalize. In particular, our study was not targeted towards at-risk or obese children.
One of Staten Island’s public schools agreed to serve as a small scale pilot for SPARK. They took a group of just under one hundred ninth graders and, starting in the second quarter of the school year, divided them into two groups. The students in two physical education classes, comprising two thirds of the group, supplemented their standard sports-based curriculum with intense cardio exercise. At the start of the program, this meant seven minutes of intense work on treadmills, eventually expanded to twenty, for three days per school week. Observers reported that, even in the early phase of the program, students were visibly exhausted. Note that both groups of students had the same physical education curriculum in the first quarter, providing a useful baseline.
The students were still subject to the state physical education curriculum, which focuses more on sports than intense aerobic activity. The implementation of SPARK was therefore not as extensive as any of the partners involved would have wanted.
A typical physical education curriculum focuses more heavily on sports than on getting students’ heart rate up. SPARK replaced the games that students typically play with intense workouts designed to keep them at a high percentage of their maximum heart rates.
The students were split based on convenience. Two gym classes were arbitrarily chosen and the third was left out as a control, but the school officials who separated the students did not do so based on any systematic difference between the students.
We were given data on student performance in math, English, and science classes, with one average score per quarter. Our main specification consists of a differences in differences approach. The first quarter, when all students had the same gym curriculum, provides a baseline for comparison. We test if student academic growth differed between treatment and control over the remaining marking periods.
Our first research design compares the difference in grades between the first and subsequent marking periods for SPARK and non SPARK students, without controls for type of class. We run three regressions in this format, with the three dependent variables being the change in grades between the first marking period and the three subsequent marking periods.
The regression design is as follows:
\[\Delta_{i,s,m}=\alpha_m+\beta_m \cdot SPARK_i\]
Here \(\Delta_{i,s,m}\) was the change in score for student \(i\) in subject \(s\) from marking period 1 to marking period \(m\), and \(SPARK_i\) is a dummy taking value 1 if and only if the student was in SPARK.
Our second research design considers the possibility that SPARK might have had different effects on different subjects. We therefore add controls for the type of subject and replace the SPARK dummy variable with an interaction term between SPARK and each of the three types of classes.
The design is as follows:
\[\Delta_{i,s,m}=\sum_j\alpha_{j,m}+\sum_j\beta_{j,m} \cdot SPARK_{i,j}\]
Here \(\alpha_j\) takes value 1 if \(j=s\) and zero otherwise, and \(SPARK_{i,j}\) takes value 1 only if the student is in SPARK and if \(j=s\).
The levels of \(j\) and \(s\) correspond to math, English, and science.
As a handful of students were in a higher-level math course, we run one further test where the levels of \(j\) and \(s\) correspond to English, science, and each of the two math courses.
All outcome variables are based on student grades, which are on the usual 0 to 100 scale. A 2.5 point coefficient, therefore, means that SPARK students saw their scores increase by a quarter of a grade level relative to non-SPARK students.
In both regression designs, the observations are student-classes.
First we report the simplest output, not broken up by subject. Note that some students did drop out of the sample in the middle of the school year. We do not know why this is the case.
Next, we break out results by subject.
Finally we break math into two separate categories.
The point estimates for math were generally higher than the point estimates for the other subjects, and were more often significant. The greater responsiveness of math scores to an intervention is in line with much of the rest of the literature (see Graff Zivin et al 2018 or the discussion in Fryer 2017).
Most of the point estimates were positive, and a handful were significant. The largest effect size was a roughly one-third grade level difference in math from quarter one to quarter two. That is, students in the treatment group saw a change in grades that was one third of a grade level higher from marking period to marking period than the control. Still, these results were not all that large or significant.
We include an R notebook in the replication page for this project with more robustness tests, none of which meaningfully alter our conclusions.
This minimal pilot program offers mild evidence that replacing students’ sports-based curriculum with aerobic activity may improve academic performance. Given that this is just in addition to the more obvious fitness benefits of exercise, we recommend exploring this effect further. The ideal research design would be a pre-registered randomized trial, publicly outlining the empirical strategy prior to implementing it.
We suggest that every gym class in Staten Island be placed into a treatment and control group at random, with the treatment group switching from a sports-based curriculum to an aerobic curriculum.
A waiver from the State curriculum would allow a more aggressive implementation of SPARK, which as noted was constrained to twenty minutes a day for three days a week.
We also recommend collecting baseline data on each of the students, especially academic performance in the prior year and level of fitness going into the school year, and committing in the pre-registration plan to divide students along these pre-selected variables to test for heterogeneous effects. This will help us learn whether less fit or less academically successful students benefit more from this intervention.
Beautrelet, I. (2016, December 13). Clustered Standard Errors in R [Web log post]. Retrieved August 2, 2018, from https://economictheoryblog.com/2016/12/13/clustered-standard-errors-in-r/
Davis, C. L., Tomporowski, P. D., McDowell, J. E., Austin, B. P., Miller, P. H., Yanasak, N. E., . . . Naglieri, J. A. (2011). Exercise improves executive function and achievement and alters brain activation in overweight children: A randomized, controlled trial. Health Psychology, 30(1), 91-98.
Fryer Jr, R. G. (2017). The production of human capital in developed countries: Evidence from 196 randomized field experimentsa. In Handbook of Economic Field Experiments (Vol. 2, pp. 95-322). North-Holland.
Graff Zivin, J., Hsiang, S. M., & Neidell, M. (2018). Temperature and human capital in the short and long run. Journal of the Association of Environmental and Resource Economists, 5(1), 77-105.
Hendricks, Paul (2015). anonymizer: Anonymize Data Containing Personally Identifiable Information. R package version 0.2.0. https://cran.r-project.org/web/packages/anonymizer/index.html
Hlavac, Marek (2018). stargazer: Well-Formatted Regression and Summary Statistics Tables. R package version 5.2.2. https://CRAN.R-project.org/package=stargazer
Valdeo, D. (2008, January 13). Exercise Seen as Priming Pump for Students’ Academic Strides. Retrieved August 14, 2018, from https://www.mdc.edu/main/images/Exercise_Seen_as_Priming_Pump_for_Students_tcm6-22258.pdf