Prepare required package

require(tidyverse)
require(scales)
require(prais)
require(DT)

Import dataset into R

Please define your own file path in the ().

admission <- read_csv(r'(D:\Data Science\FETP\Tutorial topic\interrupted_time_series\admission.csv)')
DT::datatable(admission)

This dataset is a time series of the number of hospitalization occurred in province X from Jan 2016 to Aug 2022, 50 months before and 30 months after COVID-19 occurred in this province (Mar 2020).

We want to determine the impact of COVID-19 on the number of hospitalization in province X by performing Interrupted Time Series (ITS) analysis by using segmented regression.

Equation of segmented regression is \[Yt = 𝜷0 + 𝜷1* time + 𝜷2* intervention + 𝜷3* time\;after\;intervention + 𝟄t\] Where

Prepare data

admission <- admission %>%
      mutate(timepoint = factor(timepoint, level=timepoint))%>%
      ## Define variables time, covid (intervention), and time_covid (time after intervention)
      mutate(time = row_number())%>%
      mutate(covid = if_else(time >=51,1,0))%>%
      mutate(time_covid = case_when(covid == 1 ~ row_number()-50, TRUE~0))
DT::datatable(admission )

Time series plot

ggplot(data = admission)+
      ##Plotting line chart
      geom_point(aes(x = timepoint, y = n))+
      geom_line(aes(x = timepoint, y = n),group =1)+
      ##Plotting vertical line to segmented the time series (before and after COVID-19)
      geom_vline(xintercept = 'Mar-20', linetype="dotdash", color ='red',size=1.2)+
      ##Following code is aesthetic adjustment
      scale_y_continuous(limits = c(0,30000),labels = comma)+
      xlab('Time')+
      ylab('Number of admission')+
      labs(title = 'Time series of the number of hospitalization in province X from Jan 2016 to Aug 2022')+
      theme_bw()+
      theme(axis.text.x = element_text(angle = 90, vjust = 0.3,size = 16),
            axis.text = element_text(size=16),
            axis.title = element_text(size = 16),
            plot.title = element_text(size=20))

The time series plot show that the number of hospitalization decreased after COVID-19.

ITS analysis

We will perform ITS analysis, by segmented regression using Prais–Winsten regression, which account for the first degree autocorrelation AR(1) in a linear model (more robust than OLS).

model <- prais::prais_winsten(n~ time + covid + time_covid,
                              index ='time', 
                              data = admission)
summary(model)
## 
## Call:
## prais::prais_winsten(formula = n ~ time + covid + time_covid, 
##     data = admission, index = "time")
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6192.5 -1014.0   -25.0   900.6  4853.8 
## 
## AR(1) coefficient rho after 8 iterations: 0.6349
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 18317.56    1059.94  17.282  < 2e-16 ***
## time           99.73      34.81   2.865  0.00539 ** 
## covid        -198.66    1319.03  -0.151  0.88068    
## time_covid   -470.30      83.17  -5.655 2.61e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1477 on 76 degrees of freedom
## Multiple R-squared:  0.4807, Adjusted R-squared:  0.4602 
## F-statistic: 23.45 on 3 and 76 DF,  p-value: 7.555e-11
## 
## Durbin-Watson statistic (original): 0.7918 
## Durbin-Watson statistic (transformed):  2.15
model$coefficients[1]
## (Intercept) 
##    18317.56
model$coefficients[2]
##     time 
## 99.73493
model$coefficients[3]
##     covid 
## -198.6638
model$coefficients[4]
## time_covid 
##  -470.3036

The equation of factual world is \[number\,of\,hospitalization = 18,317.56 + 99.73(time) - 198.66(covid) - 470.30(time\,covid)\] The equation of counter-factual world is \[number\,of\,hospitalization = 18,317.56 + 99.73(time)\]

Interpretation

The average number of hospitalization in province X decreased 199 cases immediately after COVID-19 then decreased significantly 470 cases gradually month by month after COVID-19.

ITS plot

Calculate predicted time series of the number of hospitalization in province X

admission<- admission%>%
      mutate(factual_trend= model$coefficients[1]+
                   model$coefficients[2]*time+
                   model$coefficients[3]*covid+
                   model$coefficients[4]*time_covid
                   )%>%
      mutate(counter_fact = model$coefficients[1]+
                   model$coefficients[2]*time)
DT::datatable(admission)

Then plot time series

ggplot(data = admission)+
      ##Plotting line chart
      geom_point(aes(x = timepoint, y = n))+
      geom_line(aes(x = timepoint, y = n),group =1)+
      ##Plotting vertical line to segmented the time series (before and after COVID-19)
      geom_vline(xintercept = 'Mar-20', linetype="dotdash", color ='red',size=1.2)+
      ##plotting factual trend
      geom_line(data = admission%>%filter(time>=51)
                ,aes(x = timepoint, y = factual_trend),group =1, linetype="dashed", color ='blue',size=1.2)+
      ##plotting counter-factual trend
      geom_line(aes(x = timepoint, y = counter_fact),group =1, linetype="dashed", color ='darkorange',size=1.2)+
      ##Following code is aesthetic adjustment
      scale_y_continuous(limits = c(0,30000),labels = comma)+
      xlab('Time')+
      ylab('Number of admission')+
      labs(title = 'Time series of the number of hospitalization in province X from Jan 2016 to Aug 2022')+
      theme_bw()+
      theme(axis.text.x = element_text(angle = 90, vjust = 0.3,size = 16),
            axis.text = element_text(size=16),
            axis.title = element_text(size = 16),
            plot.title = element_text(size=20))

The orange dashed line represent underlying trend of the number of hospitalization in province x before COVID-19 occurred and project after COVID-19 (counter-factual) with the assumption of “the level and trend of the number of hospitalization would not be altered if there were no COVID-19”. While the blue dashed line is the trend of the number of hospitalization in province x after COVID-19 (factual). The alteration in level between orange and blue dashed line represent immediate impact of COVID-19 on the number of hospitalization and the alteration of slope represent ongoing impact of COVID-19.