Introduction and Motivation
In a previous article, I reviewed how to gather data from the CDC WONDER mortality files. This was motivated by a discussion that I had with my students about a paper by Jurecka and colleagues titled, “Implementation of a statewide fentanyl possession law and opioid-related overdose deaths,” which was published in JAMA Health Forum.[1] I wanted to demonstrate to my students that publicly available data are ripe for performing meaningful scientific inquiry. To illustrate this point, I used the paper by Jurecka and colleagues as a motivating example. Jurecka and colleagues used data from CDC WONDER to evaluated the impact of the Colorado state fentanyl law on opioid-related overdose deaths. The previous article illustrates how one can leverage state-wide data on opioid-related overdose deaths for Colorado to answer an important public health question. In this article, I will use the data from the CDC WONDER and re-create (as closely as possible) some of the findings from the paper by Jurecka and colleagues.
Summary from the previous article
In their paper, Jurecka and colleagues studied the impact of a 2022 Colorado State fentanyl possession law (HB 22-1326) where having a small amount (1 gram to 4 grams) of fentanyl would be considered a felony and punishable with prison time (up to 180 days).
The authors’ objective was to predict what would happen to the opioid-related overdose death rates after the passing of the updated fentanyl possession law and to compare this with the observed opioid-related overdose death rates. The authors used an auto-regressive integrated moving average (ARIMA) model to predict the opioid-related overdose death rates between August 2022 to November 2023, which was then compared to the observed opioid-related overdose death rates.
This paper was interesting because it uses a serial cross-sectional design, which is a series of cross-sectional data across time that are stitched together to form a longitudinal trend of data. The authors focused on Colorado’s opioid-related overdose death rates, which used monthly data that were stitched together from the Colorado Department of Public Health and Environment.
Because of these features, I wanted to share with my students how the authors could have potentially performed their study.
So, I ended up downloaded data from the CDC WONDER database and performing a revised analysis (inspired by this paper) to answer a different question: Did the new fentanyl possession law impact the opioid-related overdose death rates between 2018 to 2023?
To answer this question, I have planed to write two articles: Part 1 will focus on how to acquire the data, and Part 2 will focus on how to perform an interrupted time series analysis.
Objectives
Part 1: Acquire opioid-related overdose death data from the CDC WONDER database link
Part 2: Perform interrupted time series analysis (Current article)
Part 2: Perform interrupted time series analysis
Data Source
I previously reviewed how I generated the data for this motivating example in a previous article. The last step that from that article generated the opioid-related overdose death rate, which involved taking the number of opioid-related overdose deaths and dividing by the population multiplied by 100,000:
\[\begin{align*} Opioid\text-related \, overdose \, death \, rate \, (OOD) = \frac{Number \, of \, deaths}{Population \, total} * 100,000 \end{align*}\]
This generated a rate that was normalized to 100,000 people.
The CDC WONDER did not contain the population levels for Colorado, so I had to get the Annual Population Data for Colorado from the Federal Reserve Bank of St. Louis (FRED).
FRED provides regional data for all states and territories in the United States (US), and is a good resource for population levels.
Colorado Population Levels
I used the annual population level, which I downloaded into a
*.CSV file. I was specifically looking for the population
levels from 2018 to 2023. When you download the data, the population
level will be in the 1000s. Therefore, you will need to multiply by 1000
to get the actual population level. I copied and pasted this data into
an Excel file containing the monthly opioid-overdose death data from CDC
WONDER (Note: You will need to use the *.XLSX extension to
do this because Excel functions may not work in the *.CSV
format.) Then, I used the VLOOKUP() function in Excel to
assign the annual population levels to the year. Once I had my data, I
saved this as a *.CSV file. You can download the final
*.CSV file from my GitHub
site.
Notice that we have monthly data on the number of deaths, but the population level data is for the year. For this example, we will assume that the population level is stable across the months for each year. This will allow us to estimate the monthly opioid-overdose death rate. However, our estimates will be slightly different from the original study.
Estimating the monthly opioid-overdose death rate
Now that we have our data, let’s use R to perform the
interrupted time series analysis (ITSA).
Step 1: Defining the research question
In the original article by Jurecka and colleagues, their objective was to use ARIMA to forecast the trends of monthly opioid-related overdose deaths after the implementation of the new state-wide fentanl law. However, I wanted to change this into an ITSA question: Was there a difference in the monthly opioid-related overdose death trends before and after implementing the new fentanyl law?
Note: The original article also looked at the subgroups (Hispanic adults, Non-Hispanic Black adults, Non-Hispanic White adults). For this exercise, we will only focus on the entire adult population of Colorado.
Step 2: Visualize the trends
We can use R to visual the trends and compare this to
the Figure 1A in the original article.
First, we will load the libraries and data. Since the data is in my GitHub site, we can download it by using the following path:
### Load libraries
library("tidyverse")
library("ggplot2")
library("forecast")
library("ggeffects")
library("zoo")
library("margins")
### Load data
data1 <- read.csv("https://raw.githubusercontent.com/mbounthavong/R-tutorials/refs/heads/main/Data/colorado_ood_month_v2.csv")
### Inspect data
head(data1)
## state state_code month year deaths population rate
## 1 Colorado 8 1 2018 60 5697155 1.0531572
## 2 Colorado 8 2 2018 54 5697155 0.9478415
## 3 Colorado 8 3 2018 53 5697155 0.9302889
## 4 Colorado 8 4 2018 50 5697155 0.8776310
## 5 Colorado 8 5 2018 54 5697155 0.9478415
## 6 Colorado 8 6 2018 38 5697155 0.6669996
Then, we can generate a figure to visualize the trends.
### Plot
ggplot(data1, aes(x = month, y = rate)) +
geom_line() +
geom_vline(xintercept = 55, color = "darkred", linetype = "dashed", linewidth = 1) +
ylim(0, 10) +
theme_bw()
We can compare this to Figure 1A from the original study. Notice how closely our generated visuals compares to the original. Even though we used annual population levels, our estimated monthly opioid-related overdose death rates are visually similar to the original.
Comparing figures from original study
We can also compare the average annual trends with our data to the original data.
### Average annual rate
annual.rate <- data1 %>%
group_by(year) %>%
mutate(sum_ood = sum(deaths)) %>%
mutate(denom = mean(population)) %>%
mutate(rate_annual = (sum_ood / denom)*100000)
annual.rate %>%
group_by(year) %>%
summarize(total_deaths = sum(deaths),
rate_annual = mean(rate_annual))
## # A tibble: 6 × 3
## year total_deaths rate_annual
## <int> <int> <dbl>
## 1 2018 600 10.5
## 2 2019 669 11.6
## 3 2020 986 17.0
## 4 2021 1317 22.7
## 5 2022 1210 20.7
## 6 2023 1231 20.9
Comparing total and rates from original study
When compard to the original study, our numbers are not that far off. Although we used the same data source from CDC WONDER, there are some differences. We did our best to follow the instructions from the original paper to generate the data, but there might be some details that were missed. Our values are slightly higher, which makes me believe that we may not have excluded a certain group or population. For this exercise, I am pretty content with the data that generated.
Step 3 - Conduct the interrupted time series analysis (ITSA)
For this exercise, the structural form of the ITSA is:
\[\begin{align*} Y = \beta_{0} + \beta_{1}Month + \beta_{2}Post + \beta_{3}(Month*Post) \end{align*}\]
where Month denotes the time interval, Post
denotes the period before and after the policy implementation, and
Month*Post in the interaction term. In terms of the model
coefficients, \(\beta_{0}\) denotes the
constant or Y-intercept, \(\beta_{1}\)
denotes the average monthly change before the policy implementation,
\(\beta_{2}\) denotes the level change
immediately after the policy implementation, and \(\beta_{3}\) denotes the difference in the
trends after and before the policy implementation.
In order to construct the ITSA, we will need a new variable
Post that distinguishes the period before and after
implementation of the policy. For our exercise, the Post
variable will be defined as month >= 55, which is aligned with July
2022, the start of the policy implementation.
Once we create the Post variable, we will construct the
ITSA using the linear regression model framework. Notice that the \(\beta_{2}\) estimates from the
lm1 object doesn’t provide the actual level change
(immediate effects after policy implementation). We will need to use the
margins() function to estimate the level change at
month == 55.
### Step 1: Create post variable
data1$post[data1$month < 55] = 0
data1$post[data1$month >= 55] = 1
### Step 2: Construct the ITSA - Linear regression model
lm1 <- lm(data = data1, rate ~ month + factor(post) + month:factor(post))
### Step 3: Present the results
round(
cbind(
summary(lm1)$coef,
confint(lm1)
), 3)
## Estimate Std. Error t value Pr(>|t|) 2.5 % 97.5 %
## (Intercept) 0.686 0.070 9.798 0.000 0.546 0.826
## month 0.023 0.002 10.597 0.000 0.019 0.028
## factor(post)1 0.227 0.797 0.285 0.776 -1.363 1.817
## month:factor(post)1 -0.009 0.013 -0.671 0.504 -0.034 0.017
### Step 4: Estimate the average level-difference
margins1 <- margins(lm1,
type = "response",
at = list(month = 55),
variables = "post")
summary(margins1)
## factor month AME SE z p lower upper
## post1 55.0000 -0.2436 0.1370 -1.7776 0.0755 -0.5122 0.0250
I summarized the results from the ITSA in this table.
ITSA results
We can also generate the ITSA figure using R.
### Plot: Part 1 - Generate the predictions
plot.itsa1 <- ggpredict(lm1, terms = c("month[1:71]", "post"), ci.level = 0.95) # Full model
plot.itsa2 <- ggpredict(lm1, terms = c("month[1:71]"), ci.level = 0.95) # Counterfactual
### Plot: Part 2 - Plot the fitted values
# Start with the scatter (average values at each year by periods)
colors <- c("darkgreen", "navyblue") # Create color vector
plot(data1$month, data1$rate, pch = 19, cex = 1.5, lwd = 1,
col = colors[factor(data1$post)],
xlab = "Time (month)",
ylab = "Average rate of OOD (per 100K)",
ylim = c(0, 3))
legend("topleft", title = "Policy implementation",
cex = 0.6,
legend = c("Pre", "Post"),
fill = c("darkgreen", "navyblue"))
# Plot: Part 3 - Add the lines for the per-post index periods
with(subset(plot.itsa1, x <= 55 & group == 0), lines(x, predicted, col = "darkgreen", lwd = 2))
with(subset(plot.itsa1, x >= 55 & x <= 71 & group == 1), lines(x, predicted, col = "navyblue", lwd = 2))
# Plot: Part 4 - Add the vertical dashed line for the intervention
# Line for the index period
abline(v = 55, col = "red", lty = 2, lwd = 2)
# Plot: Part 5 - Add the counterfactual line
with(subset(plot.itsa2, x >= 55 & group == 1), lines(x, predicted, col = "darkgreen", lty = 2, lwd = 2))
Step 4 - Summarize findings
Prior to the policy implementation, opioid-related overdose deaths was increasing at a monthly average rate of 0.023 deaths per 100,000 population, which was statistically significant. Immediately after policy implementation, the average opioid-related overdose deaths dropped by 0.243 deaths per 100,000; however, this was not statistically significant. After policy implementation, the number of opioid-related overdose deaths decreased by a monthly average of 0.009 deaths per 100,000; however, this was not statistically significant.
Bonus - ARIMA
In the original paper, Jurecka and colleagues sought to forecast the monthly opioid-related death rates after policy implementation. I’m not familiar with ARIMA models, but I tried to perform this analysis using some R tools.
### ARIMA Model:
data2 <- data.frame(data1$month, data1$rate)
zoo1 <- read.zoo(data2, FUN = as.numeric)
plot(zoo1)
#### Train and Test series
train_series = zoo1[1:54]
test_series = zoo1[55:71]
## Create ARIMA models
#### Iterate to find the best ARIMA Model
auto.arima(zoo1, trace = TRUE)
##
## ARIMA(2,1,2) with drift : 5.078883
## ARIMA(0,1,0) with drift : 18.88565
## ARIMA(1,1,0) with drift : 9.723031
## ARIMA(0,1,1) with drift : 0.9353447
## ARIMA(0,1,0) : 16.99648
## ARIMA(1,1,1) with drift : 3.111483
## ARIMA(0,1,2) with drift : 3.070659
## ARIMA(1,1,2) with drift : 4.753693
## ARIMA(0,1,1) : 0.2465992
## ARIMA(1,1,1) : 2.428447
## ARIMA(0,1,2) : 2.4264
## ARIMA(1,1,0) : 7.980161
## ARIMA(1,1,2) : 4.061983
##
## Best model: ARIMA(0,1,1)
## Series: zoo1
## ARIMA(0,1,1)
##
## Coefficients:
## ma1
## -0.5681
## s.e. 0.0954
##
## sigma^2 = 0.05584: log likelihood = 1.97
## AIC=0.07 AICc=0.25 BIC=4.56
#### The best model is ARIMA(0,1,1)
arima1 = arima(train_series, order=c(0, 1, 1))
print(arima1)
##
## Call:
## arima(x = train_series, order = c(0, 1, 1))
##
## Coefficients:
## ma1
## -0.5468
## s.e. 0.1110
##
## sigma^2 estimated as 0.06411: log likelihood = -2.58, aic = 9.16
#### Plot the forecast for periods 55 to 71
forecast_predict <- forecast(arima1, h = 17, level = c(95))
plot(forecast(forecast_predict))
# Add the lines for the pre-post index periods
with(subset(plot.itsa1, x <= 55 & group == 0), lines(x, predicted, col = "darkgreen", lwd = 2))
with(subset(plot.itsa1, x >= 55 & x <= 71 & group == 1), lines(x, predicted, col = "navyblue", lwd = 2))
This was my attempt at ARIMA modeling, so I’m not sure if this is even accurate. But it was rewarding to see that I had some forecast made using this method. The forecast trend (light blue line) appears to have a greater reduction in the number of opioid-related overdose deaths immediately after policy implementation. Additionally the forecast trend is much lower than one generated by the linear regression model (dark blue line).
Note that I don’t use ARIMA models, so please be careful with these findings.
Conclusions
This was an entertaining exercise that I shared with my students. It’s not typical for a study to have enough information where the reader can replicate the findings or data. With this study by Jurecka and colleagues, I was able to share with my students the power of leverage publicly available data to answer meaningful public health questions. I hope that this exercise helps to inspire others to do that same.
Ackowledgements
I don’t know much about ARIMA models, so I had to read some articles
and blogs on the materials. I found the blog by Riaz
Khan helpful in understanding the coding for ARIMA models. I also found
the online book, Forecasting:
Principles and Practice by Rob J Hyndman and George Athanasopoulos
to be incredibly helpful with using the auto.arima()
function.
References
- Jurecka C, Adams J, Padmanabhan P, Glanz J, Christine P, Guan X, Kline D, Binswanger I, Barocas J. Implementation of a Statewide Fentanyl Possession Law and Opioid-Related Overdose Deaths. JAMA Health Forum. 2025 Aug 1;6(8):e252654. doi: 10.1001/jamahealthforum.2025.2654. PMID: 40748547; PMCID: PMC12317351.
Disclosures and Disclaimers
This is a work in progress; thus, I may update this in the future.
Any errors or mistakes are my fault, and I’ll try my best to fix them.
Lastly, this is for educational purposes only.