require(tidyverse)
require(scales)
require(prais)
require(DT)
Please define your own file path in the ().
admission <- read_csv(r'(D:\Data Science\FETP\Tutorial topic\interrupted_time_series\admission.csv)')
DT::datatable(admission)
This dataset is a time series of the number of hospitalization occurred in province X from Jan 2016 to Aug 2022, 50 months before and 30 months after COVID-19 occurred in this province (Mar 2020).
We want to determine the impact of COVID-19 on the number of hospitalization in province X by performing Interrupted Time Series (ITS) analysis by using segmented regression.
Equation of segmented regression is \[Yt =
𝜷0 + 𝜷1* time + 𝜷2* intervention + 𝜷3* time\;after\;intervention +
𝟄t\] Where
admission <- admission %>%
mutate(timepoint = factor(timepoint, level=timepoint))%>%
## Define variables time, covid (intervention), and time_covid (time after intervention)
mutate(time = row_number())%>%
mutate(covid = if_else(time >=51,1,0))%>%
mutate(time_covid = case_when(covid == 1 ~ row_number()-50, TRUE~0))
DT::datatable(admission )
ggplot(data = admission)+
##Plotting line chart
geom_point(aes(x = timepoint, y = n))+
geom_line(aes(x = timepoint, y = n),group =1)+
##Plotting vertical line to segmented the time series (before and after COVID-19)
geom_vline(xintercept = 'Mar-20', linetype="dotdash", color ='red',size=1.2)+
##Following code is aesthetic adjustment
scale_y_continuous(limits = c(0,30000),labels = comma)+
xlab('Time')+
ylab('Number of admission')+
labs(title = 'Time series of the number of hospitalization in province X from Jan 2016 to Aug 2022')+
theme_bw()+
theme(axis.text.x = element_text(angle = 90, vjust = 0.3,size = 16),
axis.text = element_text(size=16),
axis.title = element_text(size = 16),
plot.title = element_text(size=20))
The time series plot show that the number of hospitalization decreased after COVID-19.
We will perform ITS analysis, by segmented regression using Prais–Winsten regression, which account for the first degree autocorrelation AR(1) in a linear model (more robust than OLS).
model <- prais::prais_winsten(n~ time + covid + time_covid,
index ='time',
data = admission)
summary(model)
##
## Call:
## prais::prais_winsten(formula = n ~ time + covid + time_covid,
## data = admission, index = "time")
##
## Residuals:
## Min 1Q Median 3Q Max
## -6192.5 -1014.0 -25.0 900.6 4853.8
##
## AR(1) coefficient rho after 8 iterations: 0.6349
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 18317.56 1059.94 17.282 < 2e-16 ***
## time 99.73 34.81 2.865 0.00539 **
## covid -198.66 1319.03 -0.151 0.88068
## time_covid -470.30 83.17 -5.655 2.61e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1477 on 76 degrees of freedom
## Multiple R-squared: 0.4807, Adjusted R-squared: 0.4602
## F-statistic: 23.45 on 3 and 76 DF, p-value: 7.555e-11
##
## Durbin-Watson statistic (original): 0.7918
## Durbin-Watson statistic (transformed): 2.15
model$coefficients[1]
## (Intercept)
## 18317.56
model$coefficients[2]
## time
## 99.73493
model$coefficients[3]
## covid
## -198.6638
model$coefficients[4]
## time_covid
## -470.3036
The equation of factual world is \[number\,of\,hospitalization = 18,317.56 + 99.73(time) - 198.66(covid) - 470.30(time\,covid)\] The equation of counter-factual world is \[number\,of\,hospitalization = 18,317.56 + 99.73(time)\]
The average number of hospitalization in province X decreased 199 cases immediately after COVID-19 then decreased significantly 470 cases gradually month by month after COVID-19.
Calculate predicted time series of the number of hospitalization in province X
admission<- admission%>%
mutate(factual_trend= model$coefficients[1]+
model$coefficients[2]*time+
model$coefficients[3]*covid+
model$coefficients[4]*time_covid
)%>%
mutate(counter_fact = model$coefficients[1]+
model$coefficients[2]*time)
DT::datatable(admission)
Then plot time series
ggplot(data = admission)+
##Plotting line chart
geom_point(aes(x = timepoint, y = n))+
geom_line(aes(x = timepoint, y = n),group =1)+
##Plotting vertical line to segmented the time series (before and after COVID-19)
geom_vline(xintercept = 'Mar-20', linetype="dotdash", color ='red',size=1.2)+
##plotting factual trend
geom_line(data = admission%>%filter(time>=51)
,aes(x = timepoint, y = factual_trend),group =1, linetype="dashed", color ='blue',size=1.2)+
##plotting counter-factual trend
geom_line(aes(x = timepoint, y = counter_fact),group =1, linetype="dashed", color ='darkorange',size=1.2)+
##Following code is aesthetic adjustment
scale_y_continuous(limits = c(0,30000),labels = comma)+
xlab('Time')+
ylab('Number of admission')+
labs(title = 'Time series of the number of hospitalization in province X from Jan 2016 to Aug 2022')+
theme_bw()+
theme(axis.text.x = element_text(angle = 90, vjust = 0.3,size = 16),
axis.text = element_text(size=16),
axis.title = element_text(size = 16),
plot.title = element_text(size=20))
The orange dashed line represent underlying trend of the number of hospitalization in province x before COVID-19 occurred and project after COVID-19 (counter-factual) with the assumption of “the level and trend of the number of hospitalization would not be altered if there were no COVID-19”. While the blue dashed line is the trend of the number of hospitalization in province x after COVID-19 (factual). The alteration in level between orange and blue dashed line represent immediate impact of COVID-19 on the number of hospitalization and the alteration of slope represent ongoing impact of COVID-19.