This analysis will investigate the effect of road building and paving on dengue and leishmaniasis—diseases transmitted by mosquitoes and sandfly vectors, respectively—in Peru’s Amazon basin. Building off of two months of field work in Summer 2022 that spurred a collaboration with Universidad Peruana Cayetano Heredia (UPCH) and the Regional Health Directorate of Madre de Dios (DIRESA, in Spanish), I plan to analyze how road quality upgrades drive changes in mobility through recently deforested areas, facilitating disease diffusion. A rapid paving of the Interoceanic Highway (2007-2009) throughout the region provides a unique opportunity to compare disease case data from health centers within 5km of the newly paved highway and those outside a 5km buffer before and after the highway was paved (a difference-in-differences causal inference approach). This work will include image classification analysis to create historical road development and quality datasets from satellite-imagery. I hypothesize that we will see dengue incidence increase almost immediately post paving (dengue produces highly localized outbreaks and responds directly to human movement), while leishmaniasis will not increase immediately post-paving and will act as a placebo (leishmaniasis transmission is affected more by frontier clearings and development of agricultural land).
A map of all spatial units in our region of interest (\(n=70\)), colored by treatment group options. All results in this document use a 5km cutoff, which results in 26 units in the treatment group and 44 in the control group. Tentatively, the results have not been sensitive to cutoff choice.
Bar plot showing how many spatial units had at least one case of dengue for each year in our data. Blue is “far” (control) and yellow is “near” (treatment).
Bar plot showing how many spatial units had at least one case of leishmaniasis for each year in our data. Blue is “far” (control) and yellow is “near” (treatment).
In this analysis, I estimate the following regression equation (run as a Poisson regression with the fixest package):
\[ log(y_{i,t})=\sum\limits_{\substack{\tau\in\{2000,...,2020\},\\\tau\neq2008}} \beta_\tau (DIST)_i \times \mathbb{1}\{t=\tau\}+\gamma_i+\lambda_t+\mathbf{X_{i,t}\theta}+\epsilon_{i,t} \] Our outcome of interest, \(log(y_{i,t})\), is the logged dengue incidence in healthcare center \(i\) for year \(t\). Our treatment variable, \(DIST\), is a dummy (binary) variable that equals one for centers near the highway and zero for centers off the highway. The \(DIST\) dummy variable interacts with year dummy variables (one for each year of interest and zero otherwise). We omit the last year before the highway was paved (i.e., the last year before treatment) as the baseline year (2008).
The coefficients, \(\beta\)’s, on the interaction term then recover the response in our outcome of interest following the road paving. Each coefficient estimates the difference between near and far healthcare centers for each year in our set of \(\tau\)’s. If our hypothesis that the highway paving increased dengue transmission in the area is correct, we should see the coefficients diverge from zero and become positive following 2008. I conduct the same analysis for leishmaniasis incidence, and we will expect its \(\beta\)’s to remain near zero following 2008.
We include spatial unit (i.e., healthcare center) fixed effects, \(\gamma_i\)’s, to help account for any time-invariant healthcare center characteristics such as differences in their ability to diagnose dengue cases, nearby populations, and general baseline dengue burden at the community level. We also include year fixed effects, \(\lambda_t\)’s, to account for large-scale climate, political, and other changes that might impact dengue comparably across all health care centers over time. Finally, we include three observable controls, noted by the matrix \(X_{it}\), to explicitly account for any variation due to changes in temperature, precipitation, or urban area. The \(\theta\) vector recovers the covariate coefficients (and is written second because of matrix multiplication ordering). Any remaining unobserved variation is captured by the error term, \(\epsilon_{it}\).
Raw data showing yearly dengue incidence in treatment (<5km) and control (>10km) groups (between 5km and 10km removed to create a small spatial buffer). Vertical line is at 2008 (last year before hypothesized “shock”, i.e. the completed paving).
Yearly population data was calculated by combining remotely-sensed WorldPop population data (2000-2020) and health center surveillance data from DIRESA (2009-2017). All years with DIRESA data were kept, and all years without DIRESA data were filled in with adjusted WorldPop data. WorldPop data was adjusted by the average ratio between DIRESA and WorldPop data for each spatial unit for years where we had both DIRESA and WorldPop data. This adjusted population data is used for all of the following results.
Each dot in this plot represents the incidence rate ratio for each year in the regression model, and the error bars display a 95% confidence interval. Each exponentiated beta coefficient estimates the ratio in yearly incidence rates between near and far healthcare centers for each year after controlling for observable and unobservable confounders (through fixed effects). If the hypothesis that the highway paving increased dengue transmission is correct, we should see the coefficients diverge from zero and become positive following 2008 (last year before hypothesized “shock”). Indeed, we see the coefficients diverge from zero starting in 2009 and generally increase over time. Additionally, all coefficients prior to 2009 are zero, which supports the parallel trends assumption. This model includes unit and year fixed effects, and standard errors are clustered at the unit and yearly levels (accounting for spatial and temporal autocorrelation). All results now include controls for yearly mean precipitation, yearly mean temperature, and logged yearly urban area, with no noticeable impact (as expected). Updated specifications: no population weighting, 5km boundary with a small buffer to separate treated and control (treated: <5km, control:>10km), logged urban area.
Raw data showing biannual dengue incidence in treatment (<5km) and control (>10km) groups. Vertical line is at 2008 (last year before hypothesized “shock”, i.e. the completed paving).
All the same as the description for the yearly model above, except for unit and biannual fixed effects and standard errors are clustered at the unit and biannual levels. I also colored the coefficients by rainy (pink) or dry (white) season. The rainy season is regarded as the main dengue season in this region and runs from the beginning of October to the end of March.
UPDATE (Jan 22nd): These results now include controls for biannual mean precipitation, biannual mean temperature, and logged biannual urban area, with no noticeable impact (as expected).
Raw data showing yearly leishmaniasis incidence in treatment (<5km) and control (>10km) groups. Vertical line is at 2008 (last year before hypothesized “shock”, i.e. the completed paving).
All the same as the description for the yearly dengue model above, except that now we would expect the coefficients to center on zero both before and after the paving. This would align with our hypothesis that leishmaniasis transmission was not impacted by the road paving. The coefficients are as expected. Please see the end of this document for easy side-by-side comparisons of the two disease systems.
UPDATE (Jan 22nd): These results now include controls for yearly mean precipitation, yearly mean temperature, and yearly urban area, with no noticeable impact (as expected).
Raw data showing biannual leishmaniasis incidence in treatment (<5km) and control (>10km) groups. Vertical line is at 2008 (last year before hypothesized “shock”, i.e. the completed paving).
UPDATE (Jan 22nd): These results now include controls for biannual mean precipitation, biannual mean temperature, and biannual urban area, with no noticeable impact (as expected).
All the same as the description for the biannual dengue model above, except that we expect coefficients to remain at zero as we did for the yearly leish regression model. Season coloring is the same as the biannual dengue model above.